Lecture Notes on the Lambda Calculus 



Department of Mathematics and Statistics 
Dalhousie University, Halifax, Canada 



Q . Peter Selinger 

O' 
(N 

;-H 

Ph. 
< 
CN ' Abstract 

(N : 

This is a set of lecture notes that developed out of courses on the lambda 

calculus that I taught at the University of Ottawa in 2001 and at Dalhousie 

^^ ' University in 2007. Topics covered in these notes include the untyped lambda 

^J , calculus, the Church-Rosser theorem, combinatory algebras, the simply-typed 

• ■ lambda calculus, the Curry-Howard isomorphism, weak and strong normal- 

fj ' ization, type inference, denotational semantics, complete partial orders, and 

the language PCF. 

> 

'i^ ■ Contents 

en 



1 Introduction Q 

1.1 Extensional vs. intensional view of functions |4| 

^^ . 1 .2 The lambda calculus \a 

OO . 

^^ . 1 .3 Untyped vs. typed lambda-calculi |7| 

1 .4 Lambda calculus and computability |8| 

^ ■ 1 .5 Connections to computer science |9| 

j^ ■ 1.6 Connections to logic |9| 

1 .7 Connections to mathematics [lOl 

2 The untyped lambda calculus [ifl 

2.1 Syntax M 



2.2 Free and bound variables, a-equivalence [l2 



2.3 Substitution 14 



2.4 Introduction to /3-reduction [l6| 



2.5 Formal definitions of /^-reduction and /3-equivalence [17 



3 Programming in tlie untyped lambda calculus Ui 



3.1 Booleans 18 



3.2 Natural numbers Il9| 



3.3 Fixpoints and recursive functions |21 



3.4 Other datatypes: pairs, tuples, lists, trees, etc |23 



4 The Church-Rosser Theorem 25 



4. 1 Extensionality, 77-equivalence, and 77-reduction |25 



4.2 Statement of the Church-Rosser Theorem, and some consequences |27 



4.3 Preliminary remarks on the proof of the Church-Rosser Theorem . |29| 



4.4 Proof of the Church-Rosser Theorem 31 



4.5 Exercises [M 



5 Combinatory algebras |37 



5.1 Applicative structures [38 



5.2 Combinatory completeness |39| 



5.3 Combinatory algebras |41 



5.4 The failure of soundness for combinatory algebras |42 



5.5 Lambda algebras 44 



5.6 Extensional combinatory algebras |48 



6 Simply-typed lambda calculus, propositional logic, and the Curry- 
Howard isomorphism [sO 

6.1 Simple types and simply-typed terms |5C 



6.2 Connections to propositional logic |53 



6.3 Propositional intuitionistic logic |55 



6.4 An alternative presentation of natural deduction |57 



6.5 The Curry-Howard Isomorphism |59| 



6.6 Reductions in the simply-typed lambda calculus |61 



6.7 A word on Church-Rosser 62 



6.8 Reduction as proof simplification |63 



6.9 Getting mileage out of the Curry-Howard isomorphism |64 



6.10 Disjunction and sum types |65 



6.11 Classical logic vs. intuitionistic logic |67 

6.12 Classical logic and the Curry-Howard isomorphism |6S 

7 Polymorphism |7C 

8 Weak and strong normalization |70 

8.1 Definitions ItC 



8.2 Weak and strong normalization in typed lambda calculus |71 



9 Type inference |72 



9.1 Principal types |73 



9.2 Type templates and type substitutions |74 



9.3 Unifiers 75 



9.4 The unification algorithm |76| 



9.5 The type inference algorithm |78 



10 Denotational semantics 

10.1 Set-theoretic interpretation 



10.2 Soundness 82 



10.3 Completeness [84 



11 The language PCF |85 

11.1 Syntax and typing rules [85 

11.2 Axiomatic equivalence [ 



11.3 Operational semantics [87 



1 1 .4 Big-step semantics |90| 



11.5 Operational equivalence |91 



11.6 Operational approximation |92 



1 1 .7 Discussion of operational equivalence |92 



11.8 Operational equivalence and parallel or |93 



12 Complete partial orders |95 



12.1 Why are sets not enough, in general? |95 



12.2 Complete partial orders |96| 



12.3 Properties of limits |97 



12.4 Continuous functions 98 



12.5 Pointed cpo's and strict functions |98 



12.6 Products and function spaces |98 



12.7 The interpretation of the simply-typed lambda calculus in com- 

plete partial orders llOOl 



12.8 Cpo's and fixpoints [101 



12.9 Example: Streams 1102 



13 Denotational semantics of PCF 102 



13.1 Soundness and adequacy 1102 



13.2 Full abstraction 104 



14 Bibliography [10 



1 Introduction 

1.1 Extensional vs. intensional view of functions 

What is a function? In modern mathematics, the prevalent notion is that of "func- 
tions as graphs": each function / has a fixed domain X and codomain Y, and a 



function / : X ^ F is a set of pairs f Q X x Y such that for each x G X, there 
exists exactly one y E Y such that {x, y) E f. Two functions f,g:X^Y are 
considered equal if they yield the same output on each input, i.e., f{x) = g{x) for 
all X E X. This is called the extensional view of functions, because it specifies 
that the only thing observable about a function is how it maps inputs to outputs. 

However, before the 20th century, functions were rarely looked at in this way. 
An older notion of functions as that of "functions as rules". In this view, to give 
a function means to give a rule for how the function is to be calculated. Often, 
such a rule can be given by a formula, for instance, the familiar f{x) = x^ or 
g{x) = sin(e^) from calculus. As before, two functions are extensionally equal if 
they have the same input-output behavior; but now we can also speak of another 
notion of equality: two functions are intensionally^ equal if they are given by 
(essentially) the same formula. 

When we think of functions as given by formulas, it is not always necessary to 
know the domain and codomain of a function. Consider for instance the function 
f{x) = X. This is, of course, the identity function. We may regard it as a function 
f : X ^ X for any set X. 

In most of mathematics, the "functions as graphs" paradigm is the most elegant 
and appropriate way of dealing with functions. Graphs define a more general class 
of functions, because it includes functions that are not necessarily given by a rule. 
Thus, when we prove a mathematical statement such as "any differentiable func- 
tion is continuous", we really mean this is true all functions (in the mathematical 
sense), not just those functions for which a rule can be given. 

On the other hand, in computer science, the "functions as rules" paradigm is often 
more appropriate. Think of a computer program as defining a function that maps 
input to output. Most computer programmers (and users) do not only care about 
the extensional behavior of a program (which inputs are mapped to which out- 
puts), but also about how the output is calculated: How much time does it take? 
How much memory and disk space is used in the process? How much communi- 
cation bandwidth is used? These are intensional questions having to do with the 
particular way in which a function was defined. 



Note that this word is intentionally spelled "intensionally" 



1.2 The lambda calculus 

The lambda calculus is a theory of functions as formulas. It is a system for ma- 
nipulating functions as expressions. 

Let us begin by looking at another well-known language of expressions, namely 
arithmetic. Arithmetic expressions are made up from variables (x, y,z . . .), num- 
bers (1, 2, 3, . . .), and operators ("+", "— ", "x"etc.). An expression such as x + y 
stands for the result of an addition (as opposed to an instruction to add, or the 
statement that something is being added). The great advantage of this language 
is that expressions can be nested without any need to mention the intermediate 
results explicitly. So for instance, we write 

A= {x + y) X z^, 

and not 

Ittw — X + y, then let u = z^, then \st A — w x u. 

The latter notation would be tiring and cumbersome to manipulate. 

The lambda calculus extends the idea of an expression language to include func- 
tions. Where we normally write 

Let / be the function x ^^ x^. Then consider A — /(5), 

in the lambda calculus we just write 

A^ (Ax.a;2)(5). 

The expression Xx.x^ stands for the function that maps x to x"^ (as opposed to the 
statement that x is being mapped to x^). As in arithmetic, we use parentheses to 
group terms. 

It is understood that the variable a; is a local variable in the term Xx.x^ . Thus, it 
does not make any difference if we write Xy.y^ instead. A local variable is also 
called a bound vaiiahle. 

One advantage of the lambda notation is that it allows us to easily talk about 
higher-orde r functions, i.e., functions whose inputs and/or outputs are themselves 
functions. An example is the operation / ^^ / o / in mathematics, which takes a 
function / and maps it to / o /, the composition of / with itself. In the lambda 
calculus, / o / is written as 

^x.fifix)), 



and the operation that maps / to / o / is written as 

Xf.Xx.fifix)). 

The evaluation of higher-order functions can get somewhat complex; as an exam- 
ple, consider the following expression: 

((A/.Ax./(/(x)))(Ay.2/2)) (5) 

Convince yourself that this evaluates to 625. Another example is given in the 
following exercise: 

Exercise 1. Evaluate the lambda-expression 

\ (A/.Ax./(/(/(x)))) {Xg.Xy.gigiy))) ) (Xz.z + 1)) (0). 



We will soon introduce some conventions for reducing the number of parentheses 
in such expressions. 

1.3 Untyped vs. typed lambda-calculi 

We have already mentioned that, when considering "functions as rules", is not 
always necessary to know the domain and codomain of a function ahead of time. 
The simplest example is the identity function / = Xx.x, which can have any set X 
as its domain and codomain, as long as domain and codomain are equal. We say 
that / has the type X ^ X. Another example is the function g == Xf.Xx.f(f(x)) 
that we encountered above. One can check that g maps any function f : X ^ X 
to a function g(f) : X ^ X. In this case, we say that the type of g is 

{X^X)^{X^X). 

By being flexible about domains and codomains, we are able to manipulate func- 
tions in ways that would not be possible in ordinary mathematics. For instance, if 
/ = Xx.x is the identity function, then we have f{x) = x for any x. In particular, 
we can take x = f, and we get 

/(/) = iXx.x){f) = f. 

Note that the equation /(/) = / never makes sense in ordinary mathematics, 
since it is not possible (for set-theoretic reasons) for a function to be included in 
its own domain. 

As another example, let to = Xx.x{x). 



Exercise 2. What is uj{uj)7 

We have several options regarding types in the lambda calculus. 

• Untyped lambda calculus. In the untyped lambda calculus, we never specify 
the type of any expression. Thus we never specify the domain or codomain 
of any function. This gives us maximal flexibility. It is also very unsafe, 
because we might run into situations where we try to apply a function to an 
argument that it does not understand. 

• Simply-typed lambda calculus. In the simply-typed lambda calculus, we 
always completely specify the type of every expression. This is very similar 
to the situation in set theory. We never allow the application of a function 
to an argument unless the type of the argument is the same as the domain of 
the function. Thus, terms such as /(/) are ruled out, even if / is the identity 
function. 

• Polymorphically typed lambda calculus. This is an intermediate situation, 
where we may specify, for instance, that a term has a type of the form 
X ^ X for all X, without actually specifying X. 

As we will see, each of these alternatives has dramatically different properties 
from the others. 

1.4 Lambda calculus and computability 

In the 1930's, several people were interested in the question: what does it mean for 
a function / : N ^ N to be computable! An informal definition of computability 
is that there should be a pencil-and-paper method allowing a trained person to 
calculate f{n), for any given n. The concept of a pencil-and-paper method is not 
so easy to formalize. Three different researchers attempted to do so, resulting in 
the following definitions of computability: 

1 . Turing defined an idealized computer we now call a Turing machine, and 
postulated that a function is computable (in the intuitive sense) if and only 
if it can be computed by such a machine. 

2. Godel defined the class of general recursive functions as the smallest set of 
functions containing all the constant functions, the successor function, and 



closed under certain operations (such as compositions and recursion). He 
postulated that a function is computable (in the intuitive sense) if and only 
if it is general recursive. 

3. Church defined an idealized programming language called the lambda cal- 
culus, and postulated that a function is computable (in the intuitive sense) if 
and only if it can be written as a lambda term. 

It was proved by Church, Kleene, Rosser, and Turing that all three computational 
models were equivalent to each other, i.e., each model defines the same class 
of computable functions. Whether or not they are equivalent to the "intuitive" 
notion of computability is a question that cannot be answered, because there is no 
formal definition of "intuitive computability". The assertion that they are in fact 
equivalent to intuitive computability is known as the Church-Turing thesis. 

1.5 Connections to computer science 

The lambda calculus is a very idealized programming language; arguably, it is the 
simplest possible programming language that is Turing complete. Because of its 
simplicity, it is a useful tool for defining and proving properties of programs. 

Many real-world programming languages can be regarded as extensions of the 
lambda calculus. This is true for all functional programming languages, a class 
that includes Lisp, Scheme, Haskell, and ML. Such languages combine the lambda 
calculus with additional features, such as data types, input/output, side effects, 
udpateable memory, object orientated features, etc. The lambda calculus provides 
a vehicle for studying such extensions, in isolation and jointly, to see how they 
will affect each other, and to prove properties of programming language (such as: 
a well-formed program will not crash). 

The lambda calculus is also a tool used in compiler construction, see e.g. |l8]|9l. 

1.6 Connections to logic 

In the 19th and early 20th centuries, there was a philosophical dispute among 
mathematicians about what a proof is. The so-called constructivists, such as 
Brower and Heyting, believed that to prove that a mathematical object exists, one 
must be able to construct it explicitly. Classical logicians, such as Hilbert, held 
that it is sufficient to derive a contradiction from the assumption that it doesn't 
exist. 



Ironically, one of the better-known examples of a proof that isn't constructive is 
Brower's proof of his own fixpoint theorem, which states that every continuous 
function on the unit disc has a fixpoint. The proof is by contradiction and does not 
give any information on the location of the fixpoint. 

The connection between lambda calculus and constructive logics is via the "proofs- 
as-programs" paradigm. To a constructivist, a proof (of an existence statement) 
must be a "construction", i.e., a program. The lambda calculus is a notation for 
such programs, and it can also be used as a notion for (constuctive) proofs. 

For the most part, constructivism has not prevailed as a philosophy in mainstream 
mathematics. However, there has been renewed interest in constructivism in the 
second half of the 20th century. The reason is that constructive proofs give more 
information than classical ones, and in particular, they allow one to compute solu- 
tions to problems (as opposed to merely knowing the existence of a solution). The 
resulting algorithms can be useful in computational mathematics, for instance in 
computer algebra systems. 

1.7 Connections to mathematics 

One way to study the lambda calculus is to give mathematical models of it, i.e., 
to provide spaces in which lambda terms can be given meaning. Such models are 
constructed using methods from algebra, partially ordered sets, topology, category 
theory, and other areas of mathematics. 



2 The untyped lambda calculus 
2.1 Syntax 

The lambda calculus is a formal language. The expressions of the language are 
called lambda terms, and we will give rules for manipulating them. 

Definition. Assume given an infinite set V of variables, denoted by x, y, z etc. 
The set of lambda terms is given by the following Backus-Naur Form: 

Lambda terms: M,7V::=x | [MN] \ (Ax.M) 

The above Backus-Naur Form (BNF) is a convenient abbreviation for the follow- 
ing equivalent, more traditionally mathematical definition: 
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Definition. Assume given an infinite set V of variables. Let A be an alphabet 
consisting of the elements of V, and the special symbols "(", ") ", "A", and ".". Let 
A* be the set of strings (finite sequences) over the alphabet A. The set of lambda 
terms is the smallest subset A C ^* such that: 

• Whenever a; e V then x G A. 

• Whenever M,N eA then (MN) e A. 

• Whenever x eV and M G A then (Xx.M) e A. 

Comparing the two equivalent definitions, we see that the Backus-Naur Form is 
a convenient notation because: (1) the definition of the alphabet can be left im- 
plicit, (2) the use of distinct meta-symbols for different syntactic classes (x, y, z 
for variables and M, N for terms) eliminates the need to explicitly quantify over 
the sets V and A. In the future, we will always present syntactic definitions in the 
BNF style. 

The following are some examples of lambda terms: 

{Xx.x) {{\x.{xx)){\y.{yy))) (A/.(Aa;.(/(/.T)))) 

Note that in the definition of lambda terms, we have built in enough mandatory 
parentheses to ensure that every term M E A can be uniquely decomposed into 
subterms. This means, each term Af e A is of precisely one of the forms x, 
(MN), (Xx.M). Terms of these three forms are called variables, applications, 
and lambda abstractions, respectively. 

We use the notation (MN), rather than M{N), to denote the application of a func- 
tion M to an argument N. Thus, in the lambda calculus, we write (fx) instead 
of the more traditional f{x). This allows us to economize more efficiently on the 
use of parentheses. To avoid having to write an excessive number of parentheses, 
we establish the following conventions for writing lambda terms: 

Convention. • We omit outermost parentheses. For instance, we write MN 
instead of (MN). 

• Applications associate to the left; thus, MNP means {MN)P. This is 
convenient when applying a function to a number of arguments, as in fxyz, 
which means {{fx)y)z. 

• The body of a lambda abstraction (the part after the dot) extends as far 
to the right as possible. In particular, Xx.MN means Xx.{MN), and not 
{Xx.M)N. 
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• Multiple lambda abstractions can be contracted; thus Xxyz.M will abbre- 
viate Xx.Xy.Xz.M. 

It is important to note that this convention is only for notational convenience; it 
does not affect the "official" definition of lambda terms. 

Exercise 3. (a) Write the following terms with as few parenthesis as possible, 
without changing the meaning or structure of the terms: 

(i) {Xx.{Xy.iXz.iixz)iyz))))), 
(ii) (((a6)M))((e/)(.g/i))), 
(iii) {Xx.{{Xy.{yx)){Xv.v)z)u){Xw.w). 

(b) Restore all the dropped parentheses in the following terms, without chang- 
ing the meaning or structure of the terms: 

(i) xxxx, 
(ii) Xx.xXy.y, 
(iii) Xx.{xXy.yxx)x. 

2.2 Free and bound variables, a-equivalence 

In our informal discussion of lambda terms, we have already pointed out that the 
terms Xx.x and Xy.y, which differ only in the name of their bound variable, are 
essentially the same. We will say that such terms are a-equivalent, and we write 
M =a N. In the rare event that we want to say that two terms are precisely equal, 
symbol for symbol, we say that M and A^ are identical and we write M = N. We 
reserve "=" as a generic symbol used for different purposes. 

An occurrence of a variable x inside a term of the form Xx.N is said to be bound. 
The corresponding Xx is called a binder, and we say that the subterm N is the 
scope of the binder. A variable occurrence that is not bound is free. Thus, for 
example, in the term 

M = {Xx.xy){Xy.yz), 

X is bound, but z is free. The variable y has both a free and a bound occurrence. 
The set of free variables of M is {y, z}. 
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More generally, the set of free variables of a term M is denoted FV{M), and it is 
defined formally as follows: 

FV{x) = {x}, 

FV{MN) = FV{M)UFV{N), 

FV{\x.M) = FV{M) \ {x}. 

This definition is an example of a definition by recursion on terms. In other words, 
in defining FV{M), we assume that we have already defined FV{N) for all 
subterms of M. We will often encounter such recursive definitions, as well as 
inductive proofs. 

Before we can formally define a-equivalence, we need to define what it means 
to rename a variable in a term. If x, y are variables, and M is a term, we write 
M{y/x} for the result of renaming a; as y in M . Renaming is formally defined as 
follows: 

x{y/x} = y, 

z{y/x} = z, if X ji z, 

iMN){y/x} = iM{y/x})iN{y/x}), 

iXx.M){y/x} = \y.iM{y/x}), 

{Xz.M){y/x} = Xz.{M{y/x}), xix^z. 

Note that this kind of renaming replaces all occurrences of x by y, whether free, 
bound, or binding. We will only apply it in cases where y does not already occur 
inM. 

Finally, we are in a position to formally define what it means for two terms to be 
"the same up to renaming of bound variables": 

Definition. We define a-equivalence to be the smallest congruence relation =q 
on lambda terms, such that for all terms M and all variables y that do not occur in 

M, 

Xx.M=^Xy.{M{y/x}). 

Recall that a relation on lambda terms is an equivalence relation if it satisfies rules 
{refl), (symm), and {trans). It is a congruence if it also satisfies rules (cong) and 
(Q. Thus, by definition, a-equivalence is the smallest relation on lambda terms 
satisfying the six rules in Table [T] 

It is easy to prove by induction that any lambda term is a-equivalent to another 
term in which the names of all bound variables are distinct from each other and 
from any free variables. Thus, when we manipulate lambda terms in theory and 
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(reft) {cone) ; — 

M ^N ... M = M' 

isymm) ^^-^ (0 Ax.M = Ax.M' 

(trans) (a) , ; — r^— 

M = P Xx.M = Xy.{M{y/x}) 



Table 1 : The rules for alpha-equivalence 



in practice, we can (and will) always assume without loss of generality that bound 
variables have been renamed to be distinct. This convention is called Barendregt's 
variable convention. 

As a remark, the notions of free and bound variables and a-equivalence are of 
course not particular to the lambda calculus; they appear in many standard math- 
ematical notations, as well as in computer science. Here are four examples where 
the variable x is bound. 



Jo 



X ax 

lima;_oo e~^ 

int succ(int x) { return x+1; } 

2.3 Substitution 

In the previous section, we defined a renaming operation, which allowed us to 
replace a variable by another variable in a lambda term. Now we turn to a less 
trivial operation, called substitution, which allows us to replace a variable by a 
lambda term. We will write M[N/x] for the result of replacing x by A^ in M. The 
definition of substitution is complicated by two circumstances: 

1. We should only replace free variables. This is because the names of bound 
variables are considered immaterial, and should not affect the result of a 
substitution. Thus, x{\xy.x)[N/x] is N{Xxy.x), and not N{Xxy.N). 
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2. We need to avoid unintended "capture" of free variables. Consider for ex- 
ample the term M = \x.yx, and let N = Xz.xz. Note that x is free in A^ 
and bound in M. What should be the result of substituting N for y in M? 
If we do this naively, we get 

M[N/y] = {\x.yx)[N/y] = \x.Nx == \x.{\z.xz)x. 

However, this is not what we intended, since the variable x was free in N, 
and during the substitution, it got bound. We need to account for the fact 
that the x that was bound in M was not the "same" x as the one that was 
free in N. The proper thing to do is to rename the bound variable before the 
substitution: 

M[N/y] = {Xx'.yx')[N/y] = Xx .Nx' = \x' .{\z.xz)x' . 

Thus, the operation of substitution forces us to sometimes rename a bound vari- 
able. In this case, it is best to pick a variable from V that has not been used yet as 
the new name of the bound variable. A variable that is currently unused is called 
fresh. The reason we stipulated that the set V is infinite was to make sure a fresh 
variable is always available when we need one. 

Definition. The (capture-avoiding) substitution of A^ for free occurrences of x in 
M, in symbols M\N/x\, is defined as follows: 

x[N/x\ = N, 

y[N/x] =y, if X ^ y, 

{MP)[N/x] =Im[N/x]){P[N/x]), 

{Xx.M)[N/x] =Xx.M, 

{Xy.M)[N/x] = Xy.{M[N/x]), if x^ywAyf^ FV{N), 

{Xy.M)[N/x] = Xy'.{M{y'/y}[N/x]), if a; 7^ y,y e FV{N), and y' fresh. 

This definition has one technical flaw: in the last clause, we did not specify which 
fresh variable to pick, and thus, technically, substitution is not well-defined. One 
way to solve this problem is to declare all lambda terms to be identified up to 
a-equivalence, and to prove that substitution is in fact well-defined modulo a- 
equivalence. Another way would be to specify which variable y' to choose: for 
instance, assume that there is a well-ordering on the set V of variables, and stipu- 
late that y' should be chosen to be the least variable that does not occur in either 
M otN. 
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2.4 Introduction to /9-reduction 

Convention. From now on, unless stated otherwise, we identify lambda terms up 
to a-equivalence. This means, when we speak of lambda terms being "equal", we 
mean that they are a-equivalent. Formally, we regard lambda terms as equivalence 
classes modulo a-equivalence. We will often use the ordinary equality symbol 
M = A^ to denote a-equivalence. 

The process of evaluating lambda terms by "plugging arguments into functions" 
is called (3-reduction. A term of the form {\x.M)N, which consists of a lambda 
abstraction applied to another term, is called a (3-redex. We say that it reduces 
to M[N/x], and we call the latter term the reduct. We reduce lambda terms by 
finding a subterm that is a redex, and then replacing that redex by its reduct. We 
repeat this as many times as we like, or until there are no more redexes left to 
reduce. A lambda term without any /3-redexes is said to be in ^-normal form. 

For example, the lambda term {\x .y){{\z .zz){\w .w)) can be reduced as follows. 
Here, we underline each redex just before reducing it: 

{\x.y){{\z.zz){\w.w)) -^13 {Xx.y){{Xw.w){Xw.w)) 
-^(3 {Xx.y){Xw.w) 

^13 y- 

The last term, y, has no redexes and is thus in normal form. We could reduce the 
same term differently, by choosing the redexes in a different order: 

{Xx.y){{Xz.zz){Xw.w)) -^j3 y. 

As we can see from this example: 

- reducing a redex can create new redexes, 

- reducing a redex can delete some other redexes, 

- the number of steps that it takes to reach a normal form can vary, depending 
on the order in which the redexes are reduced. 

We can also see that the final result, y, does not seem to depend on the order in 
which the redexes are reduced. In fact, this is true in general, as we will prove 
later 

If M and M' are terms such that M -^^p M', and if M' is in normal form, then 
we say that M evaluates to M'. 
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Not every term evaluates to something; some terms can be reduces forever without 
rachine a normal form. The following is an example: 

{\x.xx){\y.yyy) -^p {Xy.yyy){Xy.yyy) 

^13 {^y-yyy){>^y-vvv){^y-yyy) 

This example also shows that the size of a lambda term need not decrease during 
reduction; it can increase, or remain the same. The term {\x.xx){\x.xx), which 
we encountered in Section [T] is another example of a lambda term that does not 
reach a normal form. 



2.5 Formal definitions of /^-reduction and /3-equivalence 

The concept of /3-reduction can be defined formally as follows: 

Definition. We define single-step f3-reduction to be the smallest relation -^p on 
terms satisfying: 

(congi) 
icong2) 

(0 



{Xx.M)N -^,3 M[N/x] 

M ^13 M' 

MN ^p M'N 

N ^f3 N' 

MN ^p MN' 

M ^p M' 

Xx.M ^p Xx.M' 



Thus, A'l -^p M' iff M' is obtained from M by reducing a single /3-redex of M. 

Definition. We write M — »^ M' if M reduces to M' in zero or more steps. 
Formally, -^^p is defined to be the reflexive transitive closure of -^p, i.e., the 
smallest reflexive transitive relation containing ^p. 

Finally, /5-equivalence is obtained by allowing reduction steps as well as inverse 
reduction steps, i.e., by making -^p symmetric: 

Definition. We write M ^p M' if M can be transformed into M' by zero or 
more reduction steps and/or inverse reduction steps. Formally, ~p is defined to 
be the reflexive symmetric transitive closure of ^p, i.e., the smallest equivalence 
relation containing -^p. 
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Exercise 4. This definition of /3-equivalence is slightly different from the one 
given in class. Prove that they are in fact the same. 



3 Programming in the untyped lambda calculus 

One of the amazing facts about the untyped lambda calculus is that we can use it 
to encode data, such as booleans and natural numbers, as well as programs that 
operate on the data. This can be done purely within the lambda calculus, without 
adding any additional syntax or axioms. 

We will often have occasion to give names to particular lambda terms; we will 
usually use boldface letters for such names. 

3.1 Booleans 

We begin by defining two lambda terms to encode the truth values "true" and 
"false": 

T = Xxy.x 

F = Xxy.y 

Let and be the term Xab.abF. Verify the following: 

and TT ^^ T 
and TF -^p F 
and FT ^b F 



and FF 



Note that T and F are normal forms, so we can really say that a term such as 
and TT evaluates to T. We say that and encodes the boolean function "and". It 
is understood that this coding is with respect to the particular coding of "true" and 
"false". We don't claim that and MN evaluates to anything meaningful if M or 
N are terms other than T and F. 

Incidentially, there is nothing unique about the term Xab.abF. It is one of many 
possible ways of encoding the "and" function. Another possibility is Xab.bab. 

Exercise 5. Find lambda terms or and not that encode the boolean functions "or" 
and "not". Can you find more than one term? 
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Moreover, we define the term if_then_else = \x.x. This term behaves like an 
"if-then-else" function — specifically, we have 

if.then_else TMN -^p M 
if_then_else FMN -^p N 

for all lambda terms M, N. 



3.2 Natural numbers 

If / and X are lambda terms, and n ^ a natural number, write /"x for the term 
/(/(. . . (fx) ...)), where / occurs n times. For each natural number n, we define 
a lambda term n, called the nth Church numeral, as n == Xfx.f^x. Here are the 
first few Church numerals: 



= 


Xfx.x 


1 = 


-- \fx..fx 


2 = 


-- \fx..f{.fx) 


3 = 


= Xfx.fififx)) 



This particular way of encoding the natural numbers is due to Alonzo Church, 
who was also the inventor of the lambda calculus. Note that is in fact the same 
term as F; thus, when interpreting a lambda term, we should know ahead of time 
whether to interpret the result as a boolean or a numeral. 

The successor function can be defined as follows: succ = \nfx.f{nfx). What 
does this term compute when applied to a numeral? 

succ n == {Xnfx.f{nfx)){Xfx.f"x) 
-^p Xfx.f{{Xfx.rx)fx) 

-^p xfx.firx) 

= Xfx.p +^x 
= n+ 1 

Thus, we have proved that the term succ does indeed encode the successor func- 
tion, when applied to a numeral. Here are possible definitions of addition and 
multiplication: 

add = Xnmfx.nf(mfx) 
mult = Xnmf.n{mf). 
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Exercise 6. (a) Manually evaluate the lambda terms add 2 3 and mult 2 3. 



(b) Prove that add nm -^/^ n + m, for all natural numbers n, m. 

(c) Prove that mult nm ^^ n ■ m, for all natural numbers n, m. 

Definition. Suppose / : N*^ ^ N is a fc-ary function on the natural numbers, and 
that M is a lambda term. We say that M (numeralwise) represents f if for all 

ni,.. .,nfc e N, 

MtTT. ..n^-^fi /(ni,.. .,nfc). 

This definition makes explicit what it means to be an "encoding". We can say, for 
instance, that the term add = \nmfx.nf{mfx) represents the addition func- 
tion. The definition generalizes easily to boolean functions, or functions of other 
datatypes. 

Often handy is the function iszero from natural numbers to booleans, which is 
defined by 

iszero (0) = true 

iszero [n] = false, if n 7^ 0. 

Convince yourself that the following term is a representation of this function: 

iszero = Xnxy.n(Xz.y)x. 
Exercise 7. Find lambda terms that represent each of the following functions: 

(a) /(n) = (n + 3)2, 

f( A — / ^^^^ if n is even, 

(b) /(nj = I ^^j^g if n is odd, 

(c) exp (n, m) — n™ , 

(d) pred (n) = n — 1 . 

Note: part (d) is not easy. In fact. Church believed for a while that it was impos- 
sible, until his student Kleene found a solution. (In fact, Kleene said he found 
the solution while having his wisdom teeth pulled, so his trick for defining the 
predecessor function is sometimes referred to as the "wisdom teeth trick".) 

We have seen how to encode some simple boolean and arithmetic functions. How- 
ever, we do not yet have a systematic method of constructing such functions. What 
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we need is a mechanism for defining more complicated functions from simple 
ones. Consider for example the factorial function, defined by: 

0! = 1 

n! ~ n ■ (n — 1)1, ifn^O. 

The encoding of such functions in the lambda calculus is the subject of the next 
section. It is related to the concept of a fixpoint. 

3.3 Fixpoints and recursive functions 

Suppose / is a function. We say that x is a fixpoint of / if f{x) = x. In arithmetic 
and calculus, some functions have fixpoints, while others don't. For instance, 
f{x) = x^ has two fixpoints and 1, whereas f{x) — x + 1 has no fixpoints. 
Some functions have infinitely many fixpoints, notably f{x) = x. 

We apply the notion of fixpoints to the lambda calculus. If F and N are lambda 
terms, we say that TV is a fixpoint of F if FN —p N . The lambda calculus 
contrasts with arithmetic in that every lambda term has a fixpoint. This is perhaps 
the first surprising fact about the lambda calculus we learn in this course. 

Theorem 3.1. In the untyped lambda calculus, every term F has a fixpoint. 

Proof. Let A — \xy.y{xxy), and define = AA. Now suppose F is any lambda 
term, and let N = &F. We claim that iV is a fixpoint of F. This is shown by the 
following calculation: 

N :^ &F 

= AAF 

= {^xy.y{xxy))AF 
^f, F{AAF) 
= F{@F) 
= FN. 

D 

The term used in the proof is called Turing's fixpoint combinator. 

The importance of fixpoints lies in the fact that they allow us to solve equa- 
tions. After all, finding a fixpoint for / is the same thing as solving the equation 
X = /(a;). This covers equations with an arbitrary right-hand side, whose left- 
hand side is x. From the above theorem, we know that we can always solve such 
equations in the lambda calculus. 
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To see how to apply this idea, consider the question from the last section, namely, 
how to define the factorial function. The most natural definition of the factorial 
function is recursive, and we can write it in the lambda calculus as follows; 

fact n — if_then_else (iszero n)(l)(mult n(fact(pred n))) 

Here we have used various abbreviations for lambda terms that were introduced in 
the previous section. The evident problem with a recursive definition such as this 
one is that the term to be defined, fact , appears both on the left- and the right-hand 
side. In other words, to find fact requires solving an equation! 

We now apply our newfound knowledge of how to solve fixpoint equations in the 
lambda calculus. We start by rewriting the problem slightly: 

fact = An. if_then_else (iszero n)(l) (mult n(fact (pred n))) 

fact ~ (A/.An. if_then_else (iszero n)(l) (mult n(/(pred n)))) fact 

Let us temporarily write F for the term 

A/.An.if_then_else (iszero n)(l)(mult n(/(pred n))). 

Then the last equation becomes fact — F fact , which is a fixpoint equation. We 
can solve it up to /3-equivalence, by letting 

fact ^ &F 

= 0(A/.An.if_then_else (iszero n)(l)(mult n(/(pred n)))) 

Note that fact has disappeared from the right-hand side. The right-hand side is a 
closed lambda term that represents the factorial function. (A lambda term is called 
closed if it contains no free variables). 

To see how this definition works in practice, let us evaluate fact 2. Recall from 
theproof of TheoremOthat &F ^p F{&F), therefore fact ^^ Ffact. 

fact 2^0 F fact 2 

-^>/3 if_then_else (iszero 2)(l)(mult 2(fact(pred 2))) 
-*0 if_then_else (F)(T)(mult 2(fact (pred 2))) 
-^p mult 2(fact (pred 2)) 
-^/3 mult 2(fact T) 
-*/3 mult 2(i^ fact!) 

-^13 ■■■ _ 

-^fs mult 2(mult l(fact 0)) 

-^^ mult 2(mult T(F fact 0)) 

-^p mult 2(mult T(if_then_else (iszero 0)(T)(mult 2 (fact (pred 2))))) 

-^p mult 2(mult T(if_then_else (T)(T)(mult 2(fact (pred 2))))) 
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-^13 mult 2 (mult 1 1) 

^p 2 

Note that this calculation, while messy, is completely mechanical. You can easily 
convince yourself that fact 3 reduces to mult 3 (fact 2), and therefore, by the 
above calculation, to mult 3 2, and finally to 6. It is now a matter of a simple 
induction to prove that fact n ^p n\, for any n. 

Exercise 8. Write a lambda term that represents the Fibonacci function, defined 
by 

/(0) = 1, /(1) = 1, fin + 2)=f{n+l) + f{n),forn^2 

Exercise 9. Write a lambda term that represents the characteristic function of the 
prime numbers, i.e., f{n) = true if n is prime, and false otherwise. 

Exercise 10. We have remarked at the beginning of this section that the number- 
theoretic function f{x) ~ x + 1 does not have a fixpoint. On the other hand, the 
lambda term F = Ax.succ x, which represents the same function, does have a 
fixpoint by Theorem l3.1l How can you reconcile the two statements? 

Exercise 11. The first fixpoint combinator for the lambda calculus was discov- 
ered by Curry. Curry's fixpoint combinator, which is also called the paradoxical 
fixpoint combinator, is the term Y = Xf.{Xx.f{xx)){Xx.f{xx)). 

(a) Prove that this is indeed a fixpoint combinator, i.e., that YF is a fixpoint of 
F, for any term F. 

(b) Turing's fixpoint combinator not only satisfies &F =p F{&F), but also 
&F -^>0 F{&F). We used this fact in evaluating fact 2. Does an analo- 
gous property hold for Y? Does this affect the outcome of the evaluation of 
fact 2? 

(c) Can you find another fixpoint combinator, besides Curry's and Turing's? 



3.4 Other datatypes: pairs, tuples, lists, trees, etc. 

So far, we have discussed lambda terms that represented functions on booleans 
and natural numbers. However, it is easily possible to encode more general data 
structures in the untyped lambda calculus. Pairs and tuples are of interest to ev- 
erybody. The examples of lists and trees are primarily interesting to people with 
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experience in a list-processing language such as LISP or PROLOG; you can safely 
ignore these examples if you want to. 

Pairs. If M and N are lambda terms, we define the pair (M, N) to be the lambda 
term Xz.zMN. We also define two terms left = Xp.p{Xxy.x) and right = 
Xp.p(\xy.y). We observe the following: 

left (M, N) ^p M 
right (M, N) ^p N 

The terms left and right are called the left and right projections. 

Tliples. The encoding of pairs easily extends to arbitrary ?i-tuples. If Mi , . . . , M„ 
are terms, we define the n-tuple (i\/i , . . . , A/„) as the lambda term Xz.zMi . . . M„, 
and we define the ith projection tt" = Xp.p{Xxi . . . Xn-Xi). Then 

7rf {Ml,..., M„) ^p Mi , for all 1 ^ i ^ n. 

Lists. A list is different from a tuple, because its length is not necessarily fixed. 
A list is either empty ("nil"), or else it consists of a first element (the "head") 
followed by another list (the "tail"). We write nil for the empty list, and H :: T 
for the list whose head is H and whose tail is T. So, for instance, the list of the 
first three numbers can be written as 1 :: (2 :: (3 :: nil)). We usually omit the 
parentheses, where it is understood that ": :" associates to the right. Note that every 
list ends in nil . 

In the lambda calculus, we can define nil ~ Xxy.y and H :: T — Xxy.xHT. 
Here is a lambda term that adds a list of numbers: 

addlist I = l{Xht.aAA ft.(addlist i))(0). 

Of course, this is a recursive definition, and must be translated into an actual 
lambda term by the method of Section [33] In the definition of addUst , I and t are 
lists of numbers, and /i is a number. If you are very diligent, you can calculate the 
sum of last weekend's Canadian lottery results by evaluating the term 



addlist (4 :: 22 :: 24 :: 32 :: 42 :: 43 :: nil). 

Note that lists enable us to give am alternative encoding of the natural numbers: 
We can encode a natural number as a list of booleans, which we interpret as the 
binary digits and 1 . Of course, with this encoding, we would have to care- 
fully redesign our basic functions, such as successor, addition, and multiplication. 
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However, if done properly, such an encoding would be a lot more efficient (in 
terms of number of /3-reductions to be performed) than the encoding by Church 
numerals. 

Trees. A binary tree is a data structure that can be one of two things: either a leaf, 
labeled by a natural number, or a node, which has a left and a right subtree. We 
write leaf {N) for a leaf labeled N, and node (i, R) for a node with left subtree L 
and right subtree R. We can encode trees as lambda terms, for instance as follows: 

leaf (n) = Xxy.xn, node (L, R) = Xxy.yLR 

As an illustration, here is a program (i.e., a lambda term) that adds all the numbers 
at the leafs of a given tree. 

addtree t = i(An.n)(AZr. add(addtree Z)(addtree r)). 

Exercise 12. This is a voluntary programming exercise. 

(a) Write a lambda term that calculates the length of a list. 

(b) Write a lambda term that calculates the depth (i.e., the nesting level) of a 
tree. You may need to define a function max that calculates the maximum 
of two numbers. 

(c) Write a lambda term that sorts a list of numbers. You may assume given a 
term less that compares two numbers. 

4 The Church-Rosser Theorem 

4.1 Extensionality, ?]-equivalence, and ?7-reduction 

In the untyped lambda calculus, any term can be applied to another term. There- 
fore, any term can be regarded as a function. Consider a term M, not containing 
the variable x, and consider the term M' =- Xx.Mx. Then for any argument A, 
we have MA —p M'A. So in this sense, M and M' define "the same function". 
Should M and M' be considered equivalent as terms? 

The answer depends on whether we want to accept the principle that "if M and M' 
define the same function, then M and M' are equal". This is called the principle 
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of extensionality, and we have already encountered it in Section fTTTI Formally, the 
extensionality rule is the following: 

, yA.MA = M'A 
(exty) ; . 

In the presence of the axioms {^), (cong), and (/3), it can be easily seen that MA — 
M'A is true for all terms A if and only if Mx = M'x, where x is a fresh variable. 
Therefore, we can replace the extensionality rule by the following equivalent, but 
simpler rule: 

Mx = M'x, where x ^ FVjM, M') 
M = M' 

Note that we can apply the extensionaUty rule in particular to the case where M' = 
\x.Mx, where x is not free in M . As we have remarked above, Mx —p M'x, 
and thus extensionality implies that M = \x.Mx. This last equation is called the 
77-law (eta-law): 

(77) M = \x.Mx, where x ^ FV{M). 

In fact, (77) and {ext) are equivalent in the presence of the other axioms of the 
lambda calculus. We have already seen that (ext) and (/?) imply (ry). Conversely, 
assume (rf), and assume that Mx = M'x, for some terms Ad and M' not con- 
taining X freely. Then by (^), we have \x.Mx = Xx.M'x, hence by {rf) and 
transitivity, M = M' . Thus (ext) holds. 

We note that the 77-law does not follow from the axioms and rules of the lambda 
calculus that we have considered so far. In particular, the terms x and Xy.xy 
are not /3-equivalent, although they are clearly 7;-equivalent. We will prove that 
X j^f3 Xy.xy in Corollarv l4.5l below. 

Single-step 7;-reduction is the smallest relation — >,, satisfying (congi), (cong2), 
(0, and the following axiom (which is the same as the 77-law, directed right to 
left): 

(77) Xx.Mx ->^ M, where x FV{M). 

Single-step /377-reduction — >/3,, is defined as the union of the single-step (3- and 
77-reductions, i.e., M ^pjj M' iff M -^p M' or M ^,, M' . Multi-step 77- 
reduction -^,p multi-step /?77-reduction ^/3,,, as well as 7;-equivalence =,, and 
/?77-equivalence =^^ are defined in the obvious way as we did for /3-reduction and 
equivalence. We also get the evident notions of 7;-normal form, /377-normal form, 
etc. 
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4.2 Statement of the Church- Rosser Theorem, and some con- 
sequences 

Theorem (Church and Rosser, 1936). Let -^ denote either ^^3 or -^^pn- Suppose 
M, N, and P are lambda terms such that M —» N and M —>> P. Then there 
exists a lambda term Z such that N ^fr Z and P —>* Z. 

In pictures, the theorem states that the following diagram can always be com- 
pleted: 

M 



N P 

'''■4i if' 

z 

This property is called the Church-Rosser property, or confluence. Before we 
prove the Church-Rosser Theorem, let us highlight some of its consequences. 

Corollary 4.1. If M —p N then there exists some Z with M, N — »^ Z. Similarly 
for (3t]. 

Proof. Please refer to Figure [1] for an illustration of this proof. Recall that —/^ is 
the reflexive symmetric transitive closure of — >^. Suppose that M —p N . Then 
there exist n > and terms il/g, . . . , Af„ such that M = il/g, N = A/„, and 
for alH — 1 . . . n, either Mi_i ^p Mi or Mi ^p Afi_i. We prove the claim 
by induction on n. For n = 0, we have M — N and there is nothing to show. 
Suppose the claim has been proven for n — 1. Then by induction hypothesis, there 
exists a term Z' such that M -^^p Z' and Mn~i ^p Z'. Further, we know that 
either N ~*p Mn-i or Af„_i -^p N. In case N -^p Mn-i, then N ^>p Z', 
and we are done. In case M„_i -^p N, we apply the Church-Rosser Theorem 
to M„_i, Z\ and N to obtain a term Z such that Z' -^p Z and N —f>p Z. 
Since M — »^ Z' -^p Z, we are done. The proof in the case of /377-reduction is 
identical. D 

Corollary 4.2. If N is a j3-normal form and N =p M, then M ^p N, and 
similarly for j3rj. 



Proof By Corollary gj] there exists some Z with M, N -^p Z. But TV is a 
normal form, thus N =« Z. D 
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■^ Ik; 



■^A Ik; 



'■4i <d 
Z 



Figure 1: The proof of Corollarv l4. 1 1 

Corollary 4.3. If M and N are [3 -normal forms such that M =p N, then M —a 
N, and similarly for (jrj. 



Proof. By Corollary 14.21 we have M -^p N, but since M is a normal form, we 

have M =„ N. D 

Corollary 4.4. IfM —p N, then neither or both have a (3-normal form. Similarly 
for (31). 

Proof. Suppose that M —p N, and that one of them has a /3-normal form. Say, 
for instance, that M has a normal form Z. Then TV =^ Z, hence N -^p Z by 
CorollaryEJ D 

Corollary 4.5. The terms x and Xy.xy are not (3-equivalent. In particular, the 
rj-rule does not follow from the j3-rule. 

Proof. The terms x and Xy.xy are both /3-normal forms, and they are not a- 
equivalent. It follows by Corollarv l4.3l that x ^p Xy-xy. D 
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4.3 Preliminary remarks on the proof of the Church-Rosser 
Theorem 

Consider any binary relation ^ on a set, and let -^ be its reflexitive transitive 
closure. Consider the following three properties of such relations: 

(a) M (b) M (c) M 



N P N P N P 

'hk it' '^ it' ''M ic'' 

z z z 

Each of these properties states that for all M, N, P, if the solid arrows exist, then 
there exists Z such that the dotted arrows exist. The only difference between (a), 
(b), and (c) is the difference between where -^ and -^ are used. 

Property (a) is the Church-Rosser property. Property (c) is called the diamond 
property (because the diagram is shaped like a diamond). 

A naive attempt to prove the Church-Rosser Theorem might proceed as follows: 
First, prove that the relation —*p satisfies property (b) (this is relatively easy to 
prove); then use an inductive argument to conclude that it also satisfies property 
(a). 

Unfortunately, this does not work: the reason is that in general, property (b) does 
not imply property (a)! An example of a relation that satisfies property (b) but not 
property (a) is shown in Figure |2] In other words, a proof of property (b) is not 
sufficient in order to prove property (a). 

On the other hand, property (c), the diamond property, does imply property (a). 
This is very easy to prove by induction, and the proof is illustrated in Figure[3] But 
unfortunately, /3-reduction does not satisfy property (c), so again we are stuck. 

To summarize, we are faced with the following dilemma: 

• /3-reduction satisfies property (b), but property (b) does not imply property 
(a). 

• Property (c) implies property (a), but /3-reduction does not satisfy property 
(c). 

On the other hand, it seems hopeless to prove property (a) directly. In the next 
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Figure 2: An example of a relation that satisfies property (b), but not property (a) 



• • • 



v: UK UK V 

• • • • 



'M Ic' 'U K 'U K 

• • • 



jt K jt K jl K 

• • • 



'M K 'U K 






Figure 3: Proof that property (c) implies property (a) 
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section, we will solve this dilemma by defining yet another reduction relation >, 
with the following properties: 

• t> satisfies property (c), and 

• the transitive closure of > is the same as that of -^/d (or -^pn)- 

4.4 Proof of the Church-Rosser Theorem 

In this section, we will prove the Church-Rosser Theorem for /377-reduction. The 
proof for /3-reduction (without rj) is very similar, and in fact slighly simpler, so we 
omit it here. The proof presented here is due to Tait and Martin-Lof. We begin by 
defining a new relation M > M' on terms, called parallel one-step reduction. We 
define > to be the smallest relation satisfying 

(1) 
(2) 
(3) 
(4) 

(5) 



X > X 

P>P' N>N' 

PN > P'N' 

N>N' 

Xx.N > Xx.N' 

Q>Q' N>N' 

{Xx.Q)Nt'Q'[N'/x] 

P> P', where X <^FV{P) 

Xx.Px > P' 



Lemma 4.6. (a) For all M, M', if M ~^p^ M' then M > M'. 

(b) For all M, M', ifM\> M' then M ^pn M'. 

(c) —»f3ri is the reflexive, transitive closure o/>. 

Proof, (a) First note that we have P > P, for any term P. This is easily shown by 
induction on P. We now prove the claim by induction on a derivation of M -^is-q 
M'. Please refer to pages [TTl and l26l for the rules that define ^prj. We make a 
case distinction based on the last rule used in the derivation of M ~^pn M'. 

• If the last rule was (/?), then M = {Xx.Q)N and M' = Q[N/x], for some 
Q and N. But then M > M' by (4), using the facts Q>QandN >N. 
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• If the last rule was (77), then M — \x.Px and M' ~ P, for some P such 
that X ^ FV{P). Then M > M' follows from (5), using Pt> P. 

• If the last rule was {congi), then M = PN and M' = P'N, for some P, 
P', and N where P ^f^n P' ■ By induction hypothesis, P > P' . From this 
and N \> N,\t follows immediately that M > M' by (2). 

• If the last rule was (cong2), we proceed similarly to the last case. 

• If the last rule was (0, then M = Xx.N and M' = Xx.N' for some N and 
N' such that N -^pn N'. By induction hypothesis, N > A^', which implies 
M > Af ' by (3). 

(b) We prove this by induction on a derivation of M \> M' . We distinguish several 
cases, depending on the last rule used in the derivation. 

• If the last rule was (1), then M ^ M' = x, and we are done because 

X ^f37} ■^- 

• If the last rule was (2), then M = PN and M' == P'N', for some P, P' , 
N, N' with P > P' and N > N'.By induction hypothesis, P -^fjr, P' and 
N ^>fjr, N'. Since ^>^^ satisfies (cong), it follows that PN ^^3,, P'N', 
hence M -^<^pn M' as desired. 

• If the last rule was (3), then M = Xx.N and M' = Xx.N', for some N, N' 
with N > N'. By induction hypothesis, N ^^^ N', hence M = Xx.N -^ 
-^pr, Xx.N' =. M' by (0. 

• If the last rule was (4), then M ^ {Xx.Q)N and M' = Q'[N'/x], for some 
Q, Q', N, N' with Q > Q' und N t> N'. By induction hypothesis, Q ^^,, 

Q' and N -^p^ N' . Therefore M == {Xx.Q)N -^p^ {Xx.Q')N' -^pr, 
Q'[N'/x] = M', as desired. 

• If the last rule was (5), then M = Xx.Px and M' = P', for some P, P' 
with P > P', and x ^ FV{P). By induction hypothesis, P ^/3r, P', hence 
M = Xx.Px -^prj P -^f3-q P' = M', as desired. 

(c) This follows directly from (a) and (b). Let us write R* for the reflexive transi- 
tive closure of a relation R. By (a), we have -^pn Q >, hence -^pri = ^^Pn* — 
[>*. By (b), we have > C — »^^, hence >* C ^p^j* = ^prj- It follows that 

t>* = -^pv □ 
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We will soon prove that > satisfies the diamond property. Note that together with 
Lemma l4~6l c). this will immediately imply that ^/3,, satisfies the Church-Rosser 
property. 

Lemma 4.7 (Substitution). IfM > M' and U > U', then M[U/y] > M'[U'/y]. 

Proof. We assume without loss of generality that any bound variables of M are 
different from y and from the free variables of U. The claim is now proved by 
induction on derivations of M > M'. We distinguish several cases, depending on 
the last rule used in the derivation: 

• If the last rule was (1), then M = M' = x, for some variable x. If x — y, 

then M[U/y] ^ U > U' = M'[U'/y]. If x ^ y, then by (1), M[U/y] = 
y>y^M'[U'/y]. 

• If the last rule was (2), then M = PN and M' = P'N', for some P, P', N, 
N' with Pt> P' iindN> N'. By induction hypothesis, P[U/y] > P'[U'/y] 

and N[U/y] > N'[U'/y], hence by (2), M[U/y] = P[U/y]N[U/y] > 
P'[U'/y]N'[U'/y] = M'[U'/y]. 

• If the last rule was (3), then M = Xx.N and M' = Xx.N', for some N, N' 
with N > N'. By induction hypothesis, N[U/y] > N'[U'/y], hence by (3) 
M[U/y] = Xx.N[U/y] > Xx.N'[U'/y] = M'[U'/y]. 

• If the last rule was (4), then M = {Xx.Q)N and M' = Q'[N'/x], for some 
Q,Q',N,N' with Q > Q' and N > N' . By induction hypothesis, Q[U/y] > 
Q'[U'/y] and N[U/y] t> N'[U'/y], hence by (4), {Xx.Q[U/y])N[U/y] > 
Q'[U'/y][N'[U'/y]/x] = Q'[N' /x][U' /y]. Thus M[U/y] = M'[U'/y]. 

• If the last rule was (5), then M — Xx.Px and M' — P' , for some _P, P' with 
P \> P', and a; ^ FT/(P). By induction hypothesis, P[U/y] > P'[U/y\, 
hence by (5), M[U/y] = Xx.P[U/y]x > P'[U'/y] = M'[U'/y]. D 

A more conceptual way of looking at this proof is the following: consider any 
derivation of M > M' from axioms (l)-(5). In this derivation, replace any axiom 
y > y hy U > U', and propagate the changes (i.e., replace y hy U on the left- 
hand-side, and by U' on the right-hand-side of any >). The result is a derivation 
of M[U/y] > M'[U' /y]. (The formal proof that the result of this replacement 
is indeed a valid derivation requires an induction, and this is the reason why the 
proof of the substitution lemma is so long). 
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Our next goal is to prove that > satisfies the diamond property. Before proving this, 
we first define the maximal parallel one-step reduct M* of a term M as follows: 

1 . X* = X, for a variable. 

2. {PN)* ^ P*N*, if PN is not a /3-redex. 

3. {{Xx.Q)N)* ^Q*[N*/x]. 

4. (Xx.N)* ~ Xx.N*, if Xx.N is not an ?7-redex. 

5. (Xx.Px)* = P*, if a; ^ FV{P). 

Note that M* depends only on M. The following lemma implies the diamond 
property for >. 

Lemma 4.8 (Maximal parallel one-step reductions). Whenever M > M', then 

M' t> M*. 

Proof. By induction on the size of M. We distinguish five cases, depending on 
the last rule used in the derivation of M > M'. As usual, we assume that all bound 
variables have been renamed to avoid clashes. 

• If the last rule was (1), then M = M' = x, also M* ~ x, and we are done. 

• If the last rule was (2), then M = PN and M' = P'N', where P o P' and 
N > N'. By induction hypothesis P' > P* and N' > N*. Two cases: 

- If PN is not a /3-redex, then M* ^ P*N*. Thus M' = P'N' > 
P*N* = M* by (2), and we are done. 

- If PN is a /3-redex, say P = Xx.Q, then M* = Q*[N*/x]. We dis- 
tinguish two subcases, depending on the last rule used in the derivation 
of P > P': 

* If the last rule was (3), then P' = Xx.Q', where Q > Q'. By 
induction hypothesis Q' > Q*, and with N' > N*, it follows that 
M' = iXx.Q')N' > Q*[N*/x] = M* by (4). 

* If the last rule was (5), then P — Xx.Rx and P' = R', where 
X ^ FV{R) and R > R'. Consider the term Q — Rx. Since 
Rx > R'x, and Rx is a sub term of M, by induction hypothe- 
sis R'x > (Rx)*. By the substitution lemma, M' = R'N' = 
{R'x)[N'/x] > {Rx)*[N*/x] = M*. 
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• If the last rule was (3), then M = \x.N and M' = \x.N\ where N \> N'. 
Two cases: 

- If M is not an 77-redex, then M* = Xx.N*. By induction hypothesis, 

N' > N*, hence M' > M* by (3). 

- If M is an jy-redex, then N = Px, where x ^ FV{P). In this case, 
M* — P*. We distinguish two subcases, depending on the last rule 
used in the derivation of N > N': 

* If the last rule was (2), then N' = P'x, where P > P'. By 
induction hypothesis P' > P*. Hence M' = Xx.P'x > P* = 
M* by (5). 

* If the last rule was (4), then P — Xy.Q and N' = Q'[x/y], where 
Q > Q'. Then M' = \x.Q'{xly\ = Xy.Q' (note x ^ FV{Q')). 
But P > Xy.Q', hence by induction hypothesis, Xy.Q' > P* = 
M*. 

• If the last rule was (4), then M = {Xx.Q)N and M' = Q'[N' /x], where 
g > Q' and A^ [> N' . Then M* = Q*[N*/x], and M' > M* by the 
substitution lemma. 

• If the last rule was (5), then M — Xx.Px and M' = P', where P > P' and 
X ^ FV{P). Then M* = P* . By induction hypothesis, P' > P*, hence 

M'>M*. n 

The previous lemma immediately implies the diamond property for t>: 

Lemma 4.9 (Diamond property for i>). IfM > N and M \> P, then there exists Z 
such that N \> Z and P t> Z. 

Proof. TakeZ ^ M*. D 

Finally, we have a proof of the Church-Rosser Theorem: 



Proof of Theorem \4.2\ Since > satisfies the diamond property, it follows that 
its reflexive transitive closure \>* also satisfies the diamond property, as shown in 
Figure|3] But i>* is the same as ^/5,, by Lemma l431' c). and the diamond property 
for -^^pn is just the Church-Rosser property for — >/3,j. □ 
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4.5 Exercises 

Exercise 13. Give a detailed proof that property (c) from Section 14.31 implies 
property (a). 

Exercise 14. Prove that M > M, for all terms M. 

Exercise 15. Without using Lemma |4~8l prove that M > M* for all terms M. 

Exercise 16. Let Q, ~ {Xx.xx){\x.xx). Prove that Vl ^^,j VtVl. 



Exercise 17. What changes have to be made to Section 1441 to get a proof of the 
Church-Rosser Theorem for ^p, instead of ^/3,,? 

Exercise 18. Recall the properties (a)-(c) of binary relations -^ that were dis- 
cussed in Section 14.31 Consider the following similar property, which is some- 
times called the "strip property": 

(d) M 



N P 

''■-ii a:' 
Z. 

Does (d) imply (a)? Does (b) imply (d)? In each case, give either a proof or a 
counterexample. 

Exercise 19. To every lambda term Af , we may associate a directed graph (with 
possibly multiple edges and loops) Q{M) as follows: (i) the vertices are terms 
TV such that M ^p N, i.e., all the terms that M can /3-reduce to; (ii) the edges 
are given by a single-step /3-reduction. Note that the same term may have two (or 
more) reductions coming from different redexes; each such reduction is a separate 
edge. For example, let / = Xx.x. Let M = I{Ix). Then 

g{M) = I{Ix) ^^ Ix > X . 

Note that there are two separate edges from I{Ix) to Ix. We also sometimes 

write bullets instead of terms, to get • !r~jl • ^ • ■ ^^ another example, let 

il = {\x.xx){\x.xx). Then 



Qm^ O 
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(a) LetM= iXx.I{xx)){Xx.xx). Fmdg{M). 

(b) For each of the following graphs, find a term M such that Q{M) is the given 
graph, or explain why no such term exists. (Note: the "starting" vertex need 
not always be the leftmost vertex in the picture). Warning: some of these 
terms are tricky to find! 



(i) 
(ii) 

(iii) 
(iv) 

(V) 

(vi) 



-4- • 



• f- 






• i • s> • 



• -i- 



C" — '^^' — "D 



(vii) 



c% — >,o 
6 



5 Combinatory algebras 



To give a model of the lambda calculus means to provide a mathematical space 
in which the axioms of lambda calculus are satisfied. This usually means that the 
elements of the space can be understood as functions, and that certain functions 
can be understood as elements. 
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Naively, one might try to construct a model of lambda calculus by finding a set 
X such that X is in bijective correspondence with the set X^ of all functions 
from X to X. This, however, is impossible: for cardinality reason, the equation 
X = X-^ has no solutions except for a one-element set X = \. To see this, first 
note that the empty set is not a solution. Also, suppose X is a solution with 
|X| ^ 2. Then \X^\ ^ |2^|, but by Cantor's argument, \2^\ > \X\, hence X^ 
is of greater cardinality than X, contradicting X = X^ . 

There are two main strategies for constructing models of the lambda calculus, and 
both involve a restriction on the class of functions to make it smaller The first 
approach, which will be discussed in this section, uses algebra, and the essential 
idea is to replace the set X^ of all function by a smaller, and suitably defined 
set of polynomials. The second approach is to equip the set X with additional 
structure (such as topology, ordered structure, etc), and to replace X^ by a set 
of structure-preserving functions (for example, continuous functions, monotone 
functions, etc). 

5.1 Applicative structures 

Definition. An applicative structure (A, •) is a set A together with a binary op- 
eration "•". 

Note that there are no further assumptions; in particular, we do not assume that 
application is an associative operation. We write ab for a ■ b, and as in the lambda 
calculus, we follow the convention of left associativity, i.e., we write abcfoi {ab)c. 

Definition. Let (A, •) be an applicative structure. A polynomial in a set of vari- 
ables xi, . . . ,Xn and with coefficients in A is a formal expression built from vari- 
ables and elements of A by means of the application operation. In other words, 
the set of polynomials is given by the following grammar: 

t, s ::— X \ a \ ts, 

where x ranges over variables and a ranges over the elements of A. We write 
Ajxi , . . . ,Xn} for the set of polynomials in variables xi, . . . ,Xn with coefficients 
in A. 

Here are some examples of polynomials in the variables x, y, z, where a, 6 G A: 

X, xy, axx, {x{y{zb))){ax). 
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If t{xi, . . . , Xn) is a polynomial in the indicated variables, and 61, . . . , 6„ are el- 
ements of A, then we can evaluate the polynomial at the given elements: the 
evaluation t{bi , . . . , 6„) the element of A obtained by "plugging" Xi — bi into the 
polynomial, for i = 1, . . . , rt, and evaluating the resulting expression in A. Note 
that in this way, every polynomial t in n variables can be understood as a. function 
from A" -^ A. This is very similar to the usual polynomials in algebra, which 
can also either be understood as formal expressions or as functions. 

If t(a;i, . . . , Xn) and s(a;i, . . . , a;„) are two polynomials with coefficients in A, 
we say that the equation t{xi, . . . , a;„) = s{xi, . . . , a;„) holds in A if for all 
&i,.. .,5„ e A, t(6i,. .. ,6„) = s{bi,.. .,bn). 

5.2 Combinatory completeness 

Definition (Combinatory completeness). An applicative structure (A, •) is com- 
binatorially complete if for every polynomial i(xi, . . . , a;„) of n ^ variables, 
there exists some element a e A such that 

(IX\ . . . Xn ^v^l 5 ■ ■ ■ 5 Xn) 

holds in A. 

In other words, combinatory completeness means that every polynomial /Mncf/on 
t{xi, . . . ,Xn) can be represented (in curried form) by some element of A. We 
are therefore setting up a correspondence between functions and elements as dis- 
cussed in the introduction of this section. 

Note that we do not require the element a to be unique in the definition of combi- 
natory completeness. This means that we are dealing with an intensional view of 
functions, where a given function might in general have several different names 
(but see the discussion of extensionality in Section l576l l. 

The following theorem characterizes combinatory completeness in terms of a 
much simpler algebraic condition. 

Theorem 5.1. An applicative structure (A, •) is combinatorially complete if and 
only if there exist two elements s,k ^ A., such that the following equations are 
satisfied for all x,y, z G A.- 

(1) sxyz — {xz){yz) 

(2) kxy — X 
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Example 5.2. Before we prove this theorem, let us look at a few examples. 

(a) The identity function. Can we find an element i G A such that ix — x for 
aU xl Yes, indeed, we can let i = skk. We check that for all x, skkx = 
{kx){kx) — X. 

(b) The boolean "false". Can we find an element F such that for all x, y, Fxy = 
xl Yes, this is easy: Y = k. 

(c) The boolean "true". Can we find T such that Txy — yl Yes, what we need 
is Tx = i. Therefore a solution is T = ki. And indeed, for all y, we have 

kixy =: iy ~ y. 

(d) Find a function / such that fx = xx for all x. Solution: let f = sii. Then 
siix — {ix){ix) — xx. 

Proof of Theorem \5 . 1\ The "only if" direction is trivial. If A is combinatorially 
complete, then consider the polynomial t(a::,j/, z) = {xz){yz). By combinatory 
completeness, there exists some s e A with sxyz = t(x, y, z), and similarly for 

k. 

We thus have to prove the "if" direction. Recall that Ajxi, . . . , j:„} is the set of 
polynomials with variables Xl, . . . ,a;„. For each polynomial i S A{a:, j/i, . . . ,y„} 
in n + 1 variables, we will define a new polynomial \*x.t G A{j/i, . . . , j/„} in n 
variables, as follows by recursion on t: 



X*x.x 
X*x.yi 
X*x.a 
X*x.pq 



h 

kyi where j/^ 7^ a; is a variable, 

ka where a G A, 

s{X*x.p){X*x.q). 



We claim that for all t, the equation {X*x.t)x — t holds in A. Indeed, this is easily 
proved by induction on i, using the definition of A* : 

{X*x.x)x = ix — X, 

{X*x.yi)x = kyiX = yi, 

{X*x.a)x — kax — a, 

{X*x.pq)x — s{X*x.p){X*x.q)x = {{X*x.p)x){{X*x.q)x) = pq. 

Note that the last case uses the induction hypothesis for p and q. 
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Finally, to prove the theorem, assume that A has elements s, k satisfying equations 
(1) and (2), and consider a polynomial t S A{xi, . . . , a;„}. We must show that 
there exists a e A such that axi . . .Xn = t holds in A. We let 

a — X*Xi \*Xn-t. 

Note that a is a polynomial in variables, which we may consider as an element 
of A. Then from the previous claim, it follows that 

aXi...Xn — (\*X\.\*Xi \*Xn-t)x\Xi. . .Xn 

= {\*X2 \*Xn-t)x2 ■ ■ -Xn 

= t 

holds in A. D 



5.3 Combinatory algebras 

By Theorem l5.ll combinatory completeness is equivalent to the existence of the s 
and k operators. We enshrine this in the following definition: 

Definition (Combinatory algebra). A combinatory algebra (A, ■, s, k) is an ap- 
plicative structure (A, •) together with elements s, fc G A, satisfying the following 

two axioms: 

(1) sxyz — {xz){yz) 

(2) kxy = X 

Remark 5.3. The operation A*, defined in the proof of Theorem 15. II is defined 
on the polynomials of any combinatory algebra. It is called the derived lambda 
abstractor, and it satisfies the law of /3-equivalence, i.e., {\*x.t)b — t\b/x\, for 
all 6 G A. 

Finding actual examples of combinatory algebras is not so easy. Here are some 
examples: 

Example 5.4. The one-element set A = {*}, with * •>!< = *, s = *, and A: = *, is 
a combinatory algebra. It is called the trivial combinatory algebra. 

Example 5.5. Recall that A is the set of lambda terms. Let A ~ A/=^, the set of 
lambda terms modulo /3-equivalence. Define A/- A^ — MN, S — Xxyz.{xz){yz), 
and K — Xxy.x. Then (A, •, S, K) is a combinatory algebra. Also note that, by 
Corollarv l4.5l this algebra is non-trivial, i.e., it has more than one element. 
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Similar examples are obtained by replacing =^3 by —pr], and/or replacing A by the 
set Ao of closed terms. 

Example 5.6. We construct a combinatory algebra of SK-teims as follows. Let 
y be a given set of variables. The set C of terms of combinatory logic is given by 
the grammar: 

A,B ::= x | S | K | AS, 

where x ranges over the elements of V. 

On <t, we define combinatory equivalence =c as the smallest equivalence rela- 
tion satisfying SABC =c iAC){BC), KAB =c A and the rules (congi) and 
{cong2) (see page 12. St . Then the set €/—c is a combinatory algebra (called the 
free combinatory algebra generated by V, or the term algebra). You will prove in 
Exercise|20]that it is non-trivial. 

Exercise 20. On the set £ of combinatory terms, define a notion of single-step 
reduction by the following laws: 

SABC -^c {AC){BC), 
KAB -^c A, 

together with the usual rules (congi) and {cong2) (see page 12.51 1. As in lambda 
calculus, we call a term a normal form if it cannot be reduced. Prove that the 
reduction -^c satisfies the Church-Rosser property. (Hint: similarly to the lambda 
calculus, first define a suitable parallel one-step reduction > whose reflexive tran- 
sitive closure is that of — >c. Then show that it satisfies the diamond property.) 

Corollary 5.7. It immediately follows from the Church-Rosser Theorem for com- 
binatory logic (Exercise \20\l that two normal forms are = c- equivalent if and only 
if they are equal. 



5.4 The failure of soundness for combinatory algebras 

A combinatory algebra is almost a model of the lambda calculus. Indeed, given 
a combinatory algebra A, we can interpret any lambda term as follows. To each 
term M with free variables among xi, . . . , Xn, we recursively associate a polyno- 
mial [M] G A{xi,...,a;„}: 

INP\ ■.= {N\lPl 
{Xx.M} :=A*a;.|Ml. 



42 



, M = M' N ^ N' 

^''^^ W^M ^"""'^ MN = M'N' 

M = N (,. M = M' 

{symm) (i.) 



N = M Aa;.M = Xx.M' 



(trans) ^^ ^ ^. ^ = ^ (/3) 



M = P {\x.M)N = M[Nlx\ 

Table 2: The rules for /3-equivalence 



Notice that this definition is almost the identity function, except that we have 
replaced the ordinary lambda abstractor of lambda calculus by the derived lambda 
abstractorof combinatory logic. The result is a polynomial in A{a;i, . . . ,x„}. In 
the particular case where M is a closed term, we can regard |Af ] as an element of 
A. 

To be able to say that A is a "model" of the lambda calculus, we would like the 
following property to be true: 

M :^pN ^ {Ml = {N} holds in A. 

This property is called soundness of the interpretation. Unfortunately, it is in 
general false for combinatory algebras, as the following example shows. 

Example 5.8. Let M — Xx.x and N = \x.{Xy.y)x. Then clearly M —p N . On 
the other hand, 

|M] = \*x.x = i, 

|7V] = X*x.{X*y.y)x = X*x.ix = s{ki)i. 

It follows from Exercise l20l and Corollarv 15.71 that the equation i — s{ki)i does 
not hold in the combinatory algebra €/—c- In other words, the interpretation is 
not sound. 

Let us analyze the failure of the soundness property further. Recall that /?-equiva- 
lence is the smallest equivalence relation on lambda terms satisfying the six rules 
in Table 12] 

If we define a relation r^ on lambda terms by 

M ^ N <=> |M] = |iV] holds in A, 
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then we may ask which of the six rules of Table|2]the relation ^ satisfies. Clearly, 
not all six rules can be satisfied, or else we would have M =p N ^ M ^ N ^ 
|M] = |iV], i.e., the model would be sound. 

Clearly, ^ is an equivalence relation, and therefore satisfies (refl), (symm), and 
(trans). Also, (cong) is satisfied, because whenever p,q,p' ,q' are polynomials 
such that p — p' and q = q' holds in A, then clearly pq — p'q' holds in A as well. 
Finally, we know from Remark l53] that the rule {(3) is satisfied. 



So the rule that fails is the (S,) rule. Indeed, Example 15.81 illustrates this. Note 
that X ^ {Xy.y)x (from the proof of Theorem lS. It , but Xx.x ^ Xx.{Xy.y)x, and 
therefore the (Q rule is violated. 



5.5 Lambda algebras 

A lambda algebra is, by definition, a combinatory algebra that is a sound model 
of lambda calculus, and in which s and k have their expected meanings. 

Definition (Lambda algebra). A lambda algebra is a combinatory algebra A sat- 
isfying the following properties: 

(VAf , N eA) M ^p N ^ |M] = [N] (soundness), 
s — X*x.X*y.X* z.{xz){yz) (s-derived), 

k — X*x.X*y.x (k-derived). 

The purpose of the remainder of this section is to give an axiomatic description of 
lambda algebras. 

Lemma 5.9. Recall that Aq is the set of closed lambda terms, i.e., lambda terms 
without free variables. Soundness is equivalent to the following: 

{\/M,N € Ao) M =p N ^ |M] = |iV] (closed soundness) 

Proof. Clearly soundness implies closed soundness. For the converse, assume 
closed soundness and let M,N e A with M ^p N. Let FV{M) U FV{N) = 
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{xi, . . . ,Xn}- Then 

M =p N 

=> Xxi . . .Xn-M =0 Xxi . . .Xn-N ^Y (0 

=> |Axi . . . Xn-Mj = {Xxi . . .Xn.Nj by closed soundness 
^ X*xi...x„.lMj=X*xi...Xn.lNj bydef. of|-l 
^ (A*a;i...a;„.|M])a;i...a;„ 

= (A*a;i .. .Xn.lNj)xi ...Xn 
^ |M] = lA/'l byproofofThmO 

This proves soundness. D 

Definition (Translations between combinatory logic and lambda calculus). Let 
A G £ be a combinatory term (see Example 15. 6l l. We define its translation to 
lambda calculus in the obvious way: the translation A\ is given recursively by: 

Sa = Xxyz.{xz)[yz), 

Ka = Xxy.x, 

XX = X, 

iAB)x = AxBx. 

Conversely, given a lambda term M G A, we recursively define its translation Mc 
to combinatory logic like this: 

Xq x, 

(MiV)c = M^Nc, 
(Ax.Af)c = A*x.(A4). 

Lemma 5.10. For all lambda terms M, {Mc)x =I3 M. 

Lemma 5.11. Let A be a combinatory algebra satisfying k = X*x.X*y.x and 
s = X*x.X*y.X* z.{xz){yz). Then for all combinatory terms A, {Ax)c = Aholds 
in A. 

Exercise 21. Prove Lemmas r5.10| and |5.11| 

RLet £o be the set of closed combinatory terms. The following is our first useful 
characterization of lambda calculus. 

Lemma 5.12. Let A. be a combinatory algebra. Then A is a lambda algebra if 
and only if it satisfies the following property: 

(VA, B e £o) Ax ~fi Bx ^ A^ B holds in A. (alt-soundness) 



45 



Proof. First, assume that A satisfies {alt-soundness). To prove {closed soundness), 
let M, N be lambda terms with M ^p N. Then {Mc)x =/3 M =p N ^p {Nc)x, 
hence by {alt-soundness), Ale — ^c holds in A. But this is the definition of 

lAIj - [iVl. 

To prove {k-derived), note that 

k\ — (Xx.Xy.x) by definition of (—)> 

= {{Xx.Xy.x)c)x bv Lemma lS.lOl 
= {X*x.X*y.x)x by definition of (-)c. 

Hence, by {alt-soundness), it follows that k = {X*x.X*y.x) holds in A. Similarly 
for {s-derived). 

Conversely, assume that A is a lambda algebra. Let A,B G £o and assume 
Ax —p B\. By soundness, \A\J = I^a]- By definition of the interpretation, 
{Ax)c — {B\)c holds in A. But by {s-derived), {k-derived), and Lemma 15.111 
A — {A\)c — {B\)c = B holds in A, proving {alt-soundness). D 

Definition (Homomorphism). Let (A, -a, sa, ^a), (B, -b, sb, ^b) be combina- 
tory algebras. A homomorphism of combinatory algebras is a function ip : A ^> 
B such that f{sA) = sb, f{kjC} — ks, and for all a, 6 e A, 1^9(0 -a b) = 

ip{a) -B ip{b). 

Any given homomorhism 1^ : A — > B can be extended to polynomials in the 
obvious way; we define : A{a;i, . . . , a:„} -^ B{xi, . . . , a;„} by 

(p{a) — ip{a) for a G A, 

(p{x) = X if a; G {xi, . . . , a;„}, 

ip{pq) ^ if>{p)(p{q). 

Example 5.13. If ip{a) — a' and ip{b) = b', then (p{{ax){by)) = {a'x){b'y). 

The following is the main technical concept needed in the characterization of 
lambda algebras. We say that an equation holds absolutely if it holds in A and in 
any homomorphic image of A. If an equation holds only in the previous sense, 
then we sometimes say it holds locally. 

Definition (Absolute equation). Let p,q & A.{xi, . . . , Xn} be two polynomials 
with coefficients in A. We say that the equation p = q holds absolutely in A if for 
all combinatory algebras B and all homomorphisms cp : A -^ B, (f{p) ~ ifi{q) 
holds in B. If an equation holds absolutely, we write p =abs Q- 
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(a) 


Ik 


abs 


fc, 


(b) 


Is 


— abs 


S, 


(c) 


l{kx) 


— abs 


kx, 


id) 


l{sx) 


— abs 


sx, 


(e) 


lisxy) 


— abs 


sxy, 


(/) 


s{s{kk)x)y 


abs 


Ix, 


(9) 


s{s{s(ks)x)y)z 


abs 


s{sxz){syz 


{h) 


k(xy) 


^abs 


s{kx){ky), 


(i) 


s{kx)i 


^abs 


Ix. 



Table 3: An axiomatization of lambda algebras. Here 1 = s{ki). 

We can now state the main theorem characterizing lambda algebras. Let 1 = 

s(fci). 

Theorem 5.14. Let A be a combinatory algebra. Then the following are equiva- 
lent: 

1. A is a lambda algebra, 

2. A satisfies (alt-soundness), 

3. for all A, B e iL such that A\ —p B\, the equation A = B holds absolutely 
in A, 

4. A absolutely satisfies the nine axioms in Tabled 

5. A satisfies (s-derived) and (k-derived), and for all p^q (z A.{yi^ . . . ,yn}, if 
px =abs qx then \p =abs Ig, 

6. A satisfies (s-derived) and (k-derived), andforallp,q e A{x,yi, . . . ,y„}, 
ifp =abs q then X*x.p =abs A*2/.g. 

The proof proceeds via 1=>2=>3^4=>5^6^1. 

We have already proven 1 ^ 2 in Lemma |5. 121 

To prove 2^3, let FV{A) U FV{B) C {xi, . . . ,x„}, and assume Ax =i3 
Bx- Then A.Ti . ..x„.(Aa) ^p Axi ... x„.(Ba), hence (A*xi .. .a;„.^)A =/3 
(A*xi . . . Xn-B)x (why?). Since the latter terms are closed, it follows by the rule 
(alt-soundness) that A*a;i . . .Xn-A = X*xi . . .Xn-B holds in A. Since closed 
equations are preserved by homomorphisms, the latter also holds in B for any 
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homomorphism (p : A ^ 'B. Finally, this implies that A = B holds for any such 
B, proving that A = B holds absolutely in A. 

Exercise 22. Prove the implication 3^4. 

The implication 4 ^ 5 is the most difficult part of the theorem. We first dispense 
with the easier part: 

Exercise 23. Prove that the axioms from Table[3]imply (s-derived) and (k-de rived). 

The last part of 4 => 5 needs the following lemma: 

Lemma 5.15. Suppose A satisfies the nine axioms from Tabled Define a struc- 
ture (B, •, S*, K) by: 

B = {a e A I a = la}, 

a»b = sab, 

S — ks, 

K ^kk. 

Then B is a well-defined combinatory algebra. Moreover, the function (p : A. ^ 
B defined by (p{a) = ka defines a homomorphism. 

Exercise 24. Prove Lemma ISTSJ 

To prove the implication 4 => 5, assume ax = bx holds absolutely in A. Then 
If {ax) = (f{bx) holds in B by definition of "absolute". But ip{ax) — {ifa)x = 
s{ka)x and (p{bx) = {(pb)x = s(kb)x. Therefore s{ka)x = s{kb)x holds in A. 
We plug in a; = i to get s{ka)i = s{kb)i. By axiom (i), la = 16. 

To prove 5 =» 6, assume p =abs q- Then {X*x.p)x =abs P =abs q =abs iX*x.q)x 
bv the proof of Theorem lSTI Then by 5., {X*x.p) =abs {X*x.q). 

Finally, to prove 6 => 1, note that if 6 holds, then the absolute interpretation 
satisfies the ^-rule, and therefore satisfies all the axioms of lambda calculus. 

Exercise 25. Prove 6 => 1. 

Remark 5.16. The axioms in Table |3] are required to hold absolutely. They can 
be replaced by local axioms by prefacing each axiom with X*xyz. Note that this 
makes the axioms much longer. 

5.6 Extensional combinatory algebras 

Definition. An applicative structure (A, •) is extensional if for all a, 6 £ A, if 
ac = be holds for all c G A, then a = b. 
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Proposition 5.17. In an extensional combinatory algebra, the (rj) axioms is valid. 

Proof. By (/?), {\*x.Mx)c = Mc for all c e A. Therefore, by extensionality, 

{X*x.Mx) = M. D 

Proposition 5.18. In an extensional combinatory algebra, an equation holds lo- 
cally if and only if it holds absolutely. 

Proof. Clearly, if an equation holds absolutely, then it holds locally. Conversely, 
assume the equation p — q holds locally in A. Let xi, . . . , x„ be the variables 
occurring in the equation. By (/3), 

(A*a;i . . . Xn-p)xi . . .Xn = {X*xi . . . x„.g)xi . . .Xn 

holds locally. By extensionahty, 

A*a;i . . . Xn-P = A*a;i . . . Xn-q 

holds. Since this is a closed equation (no free variables), it automatially holds 
absolutely. This implies that (A*xi . . . Xn.p)xi . . .Xn — {X*xi . . . x„.g)xi . . . a;„ 
holds absolutely, and finally, by (/3) again, that p = q holds absolutely. 

Proposition 5.19. Every extensional combinatory algebra is a lambda algebra. 

Proof. By Theorem 15. 14ll6T l. it suffices to prove is-derived), (k-de rived) and the 
(^)-rule. Let a,b,c ^ A be arbitrary. Then 

{X*x.X*y.X* z.{xz){yz))abc = (ac)(bc) = sabc 

by (/3) and definition of s. Applying extensionality three times (with respect to c, 
b, and a), we get 

X*x.X*y.X* z.{xz){yz) ~ s. 

This proves (s-derived). The proof of (A;-iienVe(i) is similar. Finally, to prove (0, 
assume that p =abs q- Then by (/3), {X*x.p)c = {X*x.q)c for all c G A. By 
extensionality, X*x.p = X*x.q holds. 
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6 Simply-typed lambda calculus, prepositional logic, 
and the Curry-Howard isomorphism 

In the untyped lambda calculus, we spoke about functions without speaking about 
their domains and codomains. The domain and codomain of any function was the 
set of all lambda terms. We now introduce types into the lambda calculus, and thus 
a notion of domain and codomain for functions. The difference between types and 
sets is that types are syntactic objects, i.e., we can speak of types without having 
to speak of their elements. We can think of types as names for sets. 

6.1 Simple types and simply-typed terms 

We assume a set of basic types. We usually use the Greek letter l ("iota") to denote 
a basic type. The set of simple types is given by the following BNF: 

Simple types: A, B :■= l \ A —> B \ A x B \ 1 

The intended meaning of these types is as follows: base types are things like the 
type of integers or the type of booleans. The type ^ ^ _B is the type of functions 
from A to B. The type A x B is the type of pairs (x, y), where x has type A and 
y has type B. The type 1 is a one-element type. You can think of 1 as an abridged 
version of the booleans, in which there is only one boolean instead of two. Or you 
can think of 1 as the "void" or "unit" type in many programming languages: the 
result type of a function that has no real result. 

When we write types, we adopt the convention that x binds stronger than -^, and 
-^ associates to the right. Thus, A x B ^ C is (A x B) ^ C, and A ^> B ^ C 
isA^ {B^C). 

The set of raw typed lambda terms is given by the following BNF: 

Raw terms: M,N y.^ x \ MN \ Xx^.M \ {M,N) \ ttiM | ttsM | * 

Unlike what we did in the untyped lambda calculus, we have added special syntax 
here for pairs. Specifically, (M, A^) is a pair of terms, tt^M is a projection, with 
the intention that TTi{Mi, M2) — Mi. Also, we have added a term *, which is the 
unique element of the type 1. One other change from the untyped lambda calculus 
is that we now write Xx^.M for a lambda abstraction to indicate that x has type 
A. However, we will sometimes omit the superscripts and write Xx.M as before. 
The notions of free and bound variables and a-conversion are defined as for the 
untyped lambda calculus; again we identify a-equivalent terms. 



50 



^"PP^ T^MN:B '^ V^.,M:A 

, , , V,x:A^M:B (^„) r h A/ : A x 5 

^''^'^ Th\x^.M:A^B T h vr^Af : B 

(pair) ■ ^ ; ^*) 

^ Th {M,N) : Ax B 



rh * : 1 
Table 4: Typing rules for the simply-typed lambda calculus 

We call the above terms the raw terms, because we have not yet imposed any 
typing discipline on these terms. To avoid meaningless terms such as {M, N){P) 
or TTi{Xx.AI), we introduce typing rules. 

We use the colon notation M : yl to mean "M is of type A". (Similar to the 
element notation in set theory). The typing rules are expressed in terms of typing 
judgments. A typing judgment is an expression of the form 

xi:Ai,X2:A2, . . . ,x„:A„ \- M : A. 

Its meaning is: "under the assumption that Xi is of type Ai, for i — 1 . . .n, 
the term M is a well-typed term of type A." The free variables of M must be 
contained in xi , . . . , a;,i. The idea is that in order to determine the type of M, we 
must make some assumptions about the type of its free variables. For instance, the 
term xy will have type B if x:A -^ B and y.A. Clearly, the type of xy depends 
on the type of its free variables. 

A sequenceof assumptions ofthe form xi:Ai, . . . ,a;„:A„, as in the left-hand-side 
of a typing judgment, is called a typing context. We always assume that no variable 
appears more than once in a typing context, and we allow typing contexts to be re- 
ordered implicitly. We often use the Greek letter F to stand for an arbitrary typing 
context, and we use the notations F, F' and F, x:A to denote the concatenation of 
typing contexts, where it is always assumed that the sets of variables are disjoint. 

The symbol h, which appears in a typing judgment, is called the turnstile symbol. 
Its purpose is to separate the left-hand side from the right-hand side. 

The typing rules for the simply-typed lambda calculus are shown in Table|4] The 
rule (var) is a tautology: under the assumption that x has type A, x has type A. 
The rule (app) states that a function of type A ^ B can be applied to an argument 
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of type A to produce a result of type B. The rule (abs) states that if M is a term of 
type B with a free variable x of type A, then Xx^.M is a function of type A ^ B. 
The other rules have similar interpretations. 
Here is an example of a valid typing derivation: 



x:A -^ A, y.A V x: A—> A x:A — > A, y.A h y : A 



x:A -^ A, y.A h X : A -+ A x:A -^ A, y.A h xy : A 

x:A -^ A, y.A h x{xy) : A 

x:A -^ Ah Xy^.xjxy) : A -* A 
h \x^^^.\y^.x(xy) -.(A^ A) -^ A^ A 

One important property of these typing rules is that there is precisely one rule 
for each kind of lambda term. Thus, when we construct typing derivations in a 
bottom-up fashion, there is always a unique choice of which rule to apply next. 
The only real choice we have is about which types to assign to variables. 

Exercise 26. Give a typing derivation of each of the following typing judgments: 

(a) h Xx'^^^^^^^.xiXy'^.y) : {{A -. A) ^ B) -. B 

(b) h Aa;^^-^.(7r2X,7ria;} : {Ax B) ^ {B x A) 

Not all terms are typeable. For instance, the terms 7ri(Ax.M) and {M,N){P) 
cannot be assigned a type, and neither can the term Xx.xx. Here, by "assigning 
a type" we mean, assigning types to the free and bound variables such that the 
corresponding typing judgment is derivable. We say that a term is typeable if it 
can be assigned a type. 

Exercise 27. Show that neither of the three terms mentioned in the previous para- 
graph is typeable. 

Exercise 28. We said that we will identify a-equivalent terms. Show that this 
is actually necessary. In particular, show that if we didn't identify a-equivalent 
terms, there would be no valid derivation of the typing judgment 

h Xx^.Xx^.x : A^B ^ B. 

Give a derivation of this typing judgment using the bound variable convention. 
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6.2 Connections to propositional logic 

Consider the following types: 

(1) {AxB)^A 

(2) A^ B ^ (Ax B) 

(3) {A^B)^{B^C)^{A^C) 

(4) A^A^A 

(5) {{A^A)^B)^B 

(6) A^ (Ax B) 

(7) {A^O^C 

Let us ask, in each case, whether it is possible to find a closed term of the given 
type. We find the following terms: 

(1) Ax^^-^.TTlX 

(2) Xx^.Xy^.{x,y) 

(3) Xx^^^.Xy^^^.Xz^.yixz) 

(4) Xx^.Xy^.x and Xx^.Xy^.y 

(5) Xx<-^^^'^^^.x{Xy^.y) 

(6) can't find a closed term 

(7) can't find a closed term 

Can we answer the general question, given a type, whether there exists a closed 
term for it? 

For a new way to look at the problem, take the types (l)-(7) and make the follow- 
ing replacement of symbols: replace "^" by "^" and replace "x" by "A". We 
obtain the following formulas: 

(1) {AaB)^A 

(2) A^B^iAAB) 

(3) iA^B)^{B^C)^{A^ C) 

(4) A^A^A 

(5) {{A ^A)^B)^B 

(6) A^{AaB) 

(7) {A^C)^C 

Note that these are formulas of propositional logic, where "=»" is implication, and 
"A" is conjunction ("and"). What can we say about the validity of these formulas? 
It turns out that (l)-(5) are tautologies, whereas (6)-(7) are not. Thus, the types 
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for which we could find a lambda term turn out to be the ones that are valid when 
considered as formulas in propositional logic! This is not entirely coincidental. 

Let us consider, for example, how to prove {AaB) ^ A. The proof is very short. 
It goes as follows: "Assume A A B. Then, by the first part of that assumption, 
A holds. Thus {A A B) =^ A." On the other hand, the lambda term of the 
corresponding type is Xx^^^ .ttix. You can see that there is a close connection 
between the proof and the lambda term. Namely, if one reads Aa;'^^^ as "assume 
AaB (call the assumption 'x')", and if one reads ttix as "by the first part of 
assumption x", then this lambda term can be read as a proof of the proposition 

(aab)^ a. 

This connection between simply-typed lambda calculus and propositional logic is 
known as the "Curry-Howard isomorphism". Since types of the lambda calculus 
correspond to formulas in propositional logic, and terms correspond to proofs, the 
concept is also known as the "proofs-as-programs" paradigm, or the "formulas- 
as-types" correspondence. We will make the actual correspondence more precise 
in the next two sections. 

Before we go any further, we must make one important point. When we are 
going to make precise the connection between simply-typed lambda calculus and 
propositional logic, we will see that the appropriate logic is intuitionistic logic, and 
not the ordinary classical logic that we are used to from mathematical practice. 
The main difference between intuitionistic and classical logic is that the former 
misses the principles of "proof by contradiction" and "excluded middle". The 
principle of proof by contradiction states that if the assumption "not A" leads to 
a contradiction then we have proved A. The principle of excluded middle states 
that either "A" or "not A' must be true. 

Intuitionistic logic is also known as constructive logic, because all proofs in it 
are by construction. Thus, in intuitionistic logic, the only way to prove the ex- 
istence of some object is by actually constructing the object. This is in contrast 
with classical logic, where we may prove the existence of an object simply by 
deriving a contradiction from the assumption that the object doesn't exist. The 
disadvantage of constructive logic is that it is generally more difficult to prove 
things. The advantage is that once one has a proof, the proof can be transformed 
into an algorithm. 
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6.3 Propositional intuitionistic logic 

We start by introducing a system for intuitionistic logic that uses only three con- 
nectives: "A", "^", and "T". Formulas A, _B ... are built from atomic formulas 
a, /3, . . . via the BNF 

Formulas: A,B :■= a \ A ^ B \ A /\ B \T . 

We now need to formalize proofs. The formalized proofs will be called "deriva- 
tions". The system we introduce here is known as natural deduction, and is due 
toGentzen(1935). 

In natural deduction, derivations are certain kinds of trees. In general, we will be 
dealing with derivations of a formula A from a set of assumptions F = {Ai , . . . , A„}. 
Such a derivation will be written schematically as 

X\./±\^ . . . , Xji./i-ji 



We simplify the bookkeeping by giving a name to each assumption, and we will 
use lower-case letters such as x,y, z for such names. In using the above notation 
for schematically writing a derivation of A from assumptions F, it is understood 
that the derivation may in fact use a given assumption more than once, or zero 
times. The rules for constructing derivations are as follows: 



1. (Axiom) 



. . x:J± 
(ax) — —X 
A 



is a derivation of A from assumption A (and possibly other assumptions 
that were used zero times). We have written the letter "x" next to the rule, 
to indicate precisely which assumption we have used here. 

2. (A-introduction)If 



and B 
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are derivations of A and B, respectively, then 

r r 



(A-/, ^ « 



aab 

is a derivation of A A B. In other words, a proof of A A i? is a proof of A 
and a proof of B. 

3. (A-eHmination) If 

r 

Aab 

is a derivation of A A B, then 

r r 



^ ^AB , , ^ , Aab 

(A-£i) and (A-E2) 



A "■' B 

are derivations of A and B, respectively. In other words, from A A B, we 
are allowed to conclude both A and B. 

4. (T-introduction) 

(T-/) 



T 

is a derivation of T (possibly from some assumptions, which were not 
used). In other words, T is always true. 

5. (^-introduction) If 

T,x:A 

B 

is a derivation of B from assumptions T and A, then 

r, [x:A] 



B 
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is a derivation of A ^ B from F alone. Here, the assumption x:A is no 
longer an assumption of the new derivation — we say that it has been "can- 
celed". We indicate canceled assumptions by enclosing them in brackets [ ], 
and we indicate the place where the assumption was canceled by writing 
the letter x next to the rule where it was canceled. 

6. (^-elimination) If 



A^ B and A 

are derivations of A ^ B and A, respectively, then 

r r 



^ B 

is a derivation of B. In other words, from A ^ B and A, we are allowed 
to conclude B. This rule is sometimes called by its Latin name, "modus 
ponens". 

This finishes the definition of derivations in natural deduction. Note that, with the 
exception of the axiom, each rule belongs to some specific logical connective, and 
there are introduction and elimination rules. "A" and "^" have both introduction 
and elimination rules, whereas "T" only has an introduction rule. 

In natural deduction, like in real mathematical life, assumptions can be made at 
any time. The challenge is to get rid of assumptions once they are made. In the 
end, we would like to have a derivation of a given formula that depends on as 
few assumptions as possible — in fact, we don't regard the formula as proven 
unless we can derive it from no assumptions. The rule (^-/) allows us to discard 
temporary assumptions that we might have made during the proof. 

Exercise 29. Give a derivation, in natural deduction, for each of the formulas 
(l)-(5)fromSection|62] 

6.4 An alternative presentation of natural deduction 

The above notation for natural deduction derivations suffers from a problem of 
presentation: since assumptions are first written down, later canceled dynamically, 
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it is not easy to see when each assumption in a finished derivation was canceled. 

The following alternate presentation of natural deduction works by deriving entire 
judgments, rather than formulas. Rather than keeping track of assumptions as the 
leaves of a proof tree, we annotate each formula in a derivation with the entire set 
of assumptions that were used in deriving it. In practice, this makes derivations 
more verbose, by repeating most assumptions on each line. In theory, however, 
such derivations are easier to reason about. 

A judgment is a statement of the form xi:Ai, . . . , x^-An V- B. It states that the 
formula i? is a consequence of the (labeled) assumptions Ai, . . . , A„. The rules 
of natural deduction can now be reformulated as rules for deriving judgments: 



1. (Axiom) 



2. (A-introduction) 



(aXa:) 



T,x:Ah A 



,^r.ThA r h B 



3. (A-elimination) 



(A-Ei) ——^ — - — (A-£2) 



4. (T-introduction) 



5. (—^-introduction) 



6. (^-elimination) 



Th A r h s 



(T-/) 



rhT 



r,x:A\- B 
^'^' ''' T h A -> B 



i^-E) 



T^ A^ B T^ A 
Th B 



58 



6.5 The Curry-Howard Isomorphism 

There is an obvious one-to-one correspondence between types of the simply-typed 
lambda calculus and the formulas of propositional intuitionistic logic introduced 
in Section |63] (provided that the set of basic types can be identified with the set of 
atomic formulas). We will identify formulas and types from now on, where it is 
convenient to do so. 

Perhaps less obvious is the fact that derivations are in one-to-one correspondence 
with simply-typed lambda terms. To be precise, we will give a translation from 
derivations to lambda terms, and a translation from lambda terms to derivations, 
which are mutually inverse up to a-equivalence. 

To any derivation of xi:Ai, . . . ,Xn '-An h B, we will associate a lambda term M 
suchthat icit^i, . . . ,a;„:yl„ h Af : S is a valid typing judgment. We define A/by 
recursion on the definition of derivations. We prove simultaneously, by induction, 
that xi'.Ai, . . . , Xn'-An \- M : Ais indeed a valid typing judgment. 

1 . (Axiom) If the derivation is 



T,x:AhA' 



then the lambda term is M — x. Clearly, F, x:A h a: ; A is a valid typing 
judgment by (var). 

2. ( A-introduction) If the derivation is 



Th A Th B 

^^-'^ tTaX^ ' 

then the lambda term is M — {P, Q), where P and Q are the terms as- 
sociated to the two respective subderivations. By induction hypothesis, 
r h P : ^ and r h Q : S, thus T h {P,Q) : A x Bhy (pair). 

3 . ( A-ehmination) If the derivation is 



r^ aab 

(A-£i) 
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then we let M ^ ttiP, where P is the term associated to the subderivation. 
By induction hypothesis, T \- P : Ax B, thus T h ttiP : A by (tti). The 
case of (A-E2) is entirely symmetric. 

4. (T-introduction) If the derivation is 

(T-/) 



rhT' 

then let M ~ *. We have h * : 1 by (*). 
5. (^-introduction) If the derivation is 



T,x:AhB 



V^ A^ B 



then we let M — Xx^.P, where P is the term associated to the subderiva- 
tion. By induction hypothesis, F, x:A h P : B, hence F h Xx^.P : A ^ B 
by (abs). 

6. (^-elimination) Finally, if the derivation is 



Th A^ B Th A 

^ ' ThB 

then we let M — PQ, where P and Q are the terms associated to the two 
respective subderivations. By induction hypothesis, T \- P : A ^ B and 
ThQ : A, thus F h PQ : B by (app). 

Conversely, given a well-typed lambda term M, with associated typing judgment 
F h M : A, then we can construct a derivation of A from assumptions F. We 
define this derivation by recursion on the type derivation of F h M : A. The 
details are too tedious to spell them out here; we simply go though each of the 
rules (var), (app), (abs), (pair), (tti), (tt2), (*) and apply the corresponding rule 
(ax), (^-I), (->-£), (A-/), (A-^i), (A-£'2), (T-I), respectively. 



60 



(/?^) 


{Xx'^.M)N - 


-^ M[N/x 


iv^) 


Xx'^.Mx 


-> M, 


(/?x4) 


TTl{M,N) 


-^ M, 


(/?x,2) 


MM,N) 


-^ N, 


(^x) 


(^iM,7r2M) - 


-^ M, 


(^l) 


M 


-^ *, 



6.6 Reductions in the simply-typed lambda calculus 

(3- and yy-reductions in the simply-typed lambda calculus are defined much in the 
same way as for the untyped lambda calculus, except that we have introduced 
some additional terms (such as pairs and projections), which calls for some addi- 
tional reduction rules. We define the following reductions: 



where a; (^FV{M), 



ifM : 1. 

Then single- and multi-step /3- and ?7-reduction are defined as the usual contextual 
closure of the above rules, and the definitions of /3- and ry-equivalence also follow 
the usual pattern. In addition to the usual (cong) and (^) rules, we now also have 
congruence rules that apply to pairs and projections. 

We remark that, to be perfectly precise, we should have defined reductions be- 
tween typing judgments, and not between terms. This is necessary because some 
of the reduction rules, notably (rji), depend on the type of the terms involved. 
However, this would be notationally very cumbersome, and we will blur the dis- 
tinction, pretending at times that terms appear in some implicit typing context that 
we do not write. 

An important property of the reduction is the "subject reduction" property, which 
states that well-typed terms reduce only to well-typed terms of the same type. 
This has an immediate application to programming: subject reduction guarantees 
that if we write a program of type "integer", then the final result of evaluating the 
program, if any, will indeed be an integer, and not, say, a boolean. 

Theorem 6.1 (Subject Reduction). Ifr\-M:A and M -^p^ M', then T h 
M' : A 

Proof. By induction on the derivation of M -^pn M', and by case distinction on 
the last rule used in the derivation of F h M : A. For instance, if M -^pn M' by 
(/3_), then M = {Xx^ .P)Q and M' ^ P[Q/x]. If F h M : A, then we must 
have T,x:B\- P : AandVl- Q : B. It follows that F h P[Q/x] : A; the latter 
statement can be proved separately (as a "substitution lemma") by induction on P 
and makes crucial use of the fact that x and Q have the same type. 
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The other cases are similar, and we leave them as an exercise. Note that, in par- 
ticular, one needs to consider the (cong), (^), and other congruence rules as well. 

D 



6.7 A word on Church- Rosser 

One important theorem that does not hold for /3?7-reduction in the simply-typed 
A^'^ '^-calculus is the Church-Rosser theorem. The culprit is the rule (771). For 
instance, if a; is a variable of type A x 1, then the term M ~ (ttix, tt2x) reduces 
to X by {rjx), but also to (ttix, *) by (rji). Both these terms are normal forms. 
Thus, the Church-Rosser property fails. 



{niX,TT2x) 





TTlX, *) 



There are several ways around this problem. For instance, if we omit all the rj- 
reductions and consider only /3-reductions, then the Church-Rosser property does 
hold. Eliminating 77-reductions does not have much of an effect on the lambda 
calculus from a computational point of view; already in the untyped lambda cal- 
culus, we noticed that all interesting calculations could in fact be carried out with 
/3-reductions alone. We can say that /3-reductions are the engine for computation, 
whereas ?7-reductions only serve to clean up the result. In particular, it can never 
happen that some ?7-reduction inhibits another /3-reduction: if M ^^ M', and if 
M' has a /3-redex, then it must be the case that M already has a corresponding 
/3-redex. Also, ry-reductions always reduce the size of a term. It follows that if 
M is a /3-normal form, then M can always be reduced to a f3rj-normal form (not 
necessarily unique) in a finite sequence of r^-reductions. 

Exercise 30. Prove the Church-Rosser theorem for /3-reductions in the A^'^'^- 
calculus. Hint: use the same method that we used in the untyped case. 

Another solution is to omit the type 1 and the term * from the language. In this 
case, the Church-Rosser property holds even for /377-reduction. 

Exercise 31. Prove the Church-Rosser theorem for /Jiy-reduction in the A^'^- 
calculus, i.e., the simply-typed lambda calculus without 1 and *. 
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6.8 Reduction as proof simplification 

Having made a one-to-one correspondence between simply-typed lambda terms 
and derivations in intuitionistic natural deduction, we may now ask what (3- and 
77-reductions correspond to under this correspondence. It turns out that these re- 
ductions can be thought of as "proof simplification steps". 

Consider for example the /3-reduction tti (M, N) -^ M. If we translate the left- 
hand side and the right-hand side via the Curry-Howard isomorphism (here we 
use the first notation for natural deduction), we get 

r r 



(A-„.-^ « 



(A-£,) A^ . A. 

We can see that the left derivation contains an introduction rule immediately fol- 
lowed by an elimination rule. This leads to an obvious simplification if we replace 
the left derivation by the right one. 

In general, /3-redexes correspond to situations where an introduction rule is im- 
mediately followed by an elimination rule, and yy-redexes correspond to situations 
where an elimination rule is immediately followed by an introduction rule. For 
example, consider the //-reduction (ttiM, 7r2M) -^ M. This translates to: 



, AAB , , Ar\B 

(A-Ei) (A-£2)- 



(A-/) ^ AAB 

AAB 

Again, this is an obvious simplification step, but it has a side condition: the left 
and right subderivation must be the same! This side condition corresponds to the 
fact that in the redex (ttiM, 7r2M), the two subterms called M must be equal. It 
is another characteristic of 77-reductions that they often carry such side conditions. 

The reduction M ^ * translates as follows: 

r 



T ^ (T-/)- 
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T 



In other words, any derivation of T can be replaced by the canonical such deriva- 
tion. 

More interesting is the case of the (/?-♦) rule. Here, we have {Xx^.M)N -^ 
M[N/x], which can be translated via the Curry-Howard Isomorphism as follows: 



r, [x:A] 



r, A 



i-^-I) — -^ X A ': 

(^-E) ^^^ ^ B . 

B 

What is going on here is that we have a derivation M of B from assumptions F 
and A, and we have another derivation N of A from F. We can directly obtain a 
derivation of B from F by stacking the second derivation on top of the first! 

Notice that this last proof "simplification" step may not actually be a simplifica- 
tion. Namely, if the hypothesis labeled x is used many times in the derivation 
M, then N will have to be copied many times in the right-hand side term. This 
corresponds to the fact that if x occurs several times in M, then M[N/x\ might 
be a longer and more complicated term than {\x.M)N . 

Finally, consider the [f]^) rule Xx^^.Mx -^ M, where x ^ FV{M). This trans- 
lates to derivations as follows: 



i^-E) 



A^ B (ax) I^^x 



-I) X -^ A^ B 

A^ B 



6.9 Getting mileage out of the Curry-Howard isomorphism 

The Curry-Howard isomorphism makes a connection between the lambda calculus 
and logic. We can think of it as a connection between "programs" and "proofs". 
What is such a connection good for? Like any isomorphism, it allows us to switch 
back and forth and think in whichever system suits our intuition in a given situ- 
ation. Moreover, we can save a lot of work by transferring theorems that were 
proved about the lambda calculus to logic, and vice versa. As an example, we will 
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see in the next section how to add disjunctions to propositional intuitionistic logic, 
and then we will explore what we can learn about the lambda calculus from that. 

6.10 Disjunction and sum types 

To the BNF for formulas of propositional intuitionistic logic from Section l631 we 
add the following clauses: 

Formulas: A, B ::= . . . \ A^ B \ ±. 

Here, AV B stands for disjunction, or "or", and _L stands for falsity, which we 
can also think of as zero-ary disjunction. The symbol _L is also known by the 
names of "bottom", "absurdity", or "contradiction". The rules for constructing 
derivations are extended by the following cases: 



7. (V-introduction) 



(v-/.) ,.^^^4^ (v-/.) r^^ 



T^ AV B Th AV B 

In other words, if we have proven A or we have proven B, then we may 
conclude AV B. 

8. (V-elimination) 

ThAWB T,x:AhC r,y:BhC 



(V-£^,j,) 



rhc 



This is known as the "principle of case distinction". If we know AV B, and 
we wish to prove some formula C, then we may proceed by cases. In the 
first case, we assume A holds and prove C In the second case, we assume 
B holds and prove C. In either case, we prove C, which therefore holds 
independently. 

Note that the V-elimination rule differs from all other rules we have consid- 
ered so far, because it involves some arbitrary formula C that is not directly 
related to the principal formula AV B being eliminated. 

9. (_L-elimination) 

^ ^ rhc" 

for an arbitrary formula C. This rule formalizes the familiar principle "ex 
falsum quodlibet", which means that falsity implies anything. 
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There is no _L-introduction rule. This is symmetric to the fact that there is no 
T-elimination rule. 

Having extended our logic with disjunctions, we can now ask what these disjunc- 
tions correspond to under the Curry-Howard isomorphism. Naturally, we need to 
extend the lambda calculus by as many new terms as we have new rules in the 
logic. It turns out that disjunctions correspond to a concept that is quite natural in 
programming: "sum" or "union" types. 

To the lambda calculus, add type constructors A + B and 0. 

Simple types: A, B ::= . . . \ A + B \0. 

Intuitively, A + B is the disjoint union of A and B, as in set theory: an element of 
A + B is either an element of A or an element of B, together with an indication 
of which one is the case. In particular, if we consider an element of A + A, we 
can still tell whether it is in the left or right component, even though the two types 
are the same. In programming languages, this is sometimes known as a "union" 
or "variant" type. We call it a "sum" type here. The type is simply the empty 
type, corresponding to the empty set in set theory. 

What should the lambda terms be that go with these new types? We know from 
our experience with the Curry-Howard isomorphism that we have to have pre- 
cisely one term constructor for each introduction or elimination rule of natural 
deduction. Moreover, we know that if such a rule has n subderivations, then our 
term constructor has to have n immediate subterms. We also know something 
about bound variables: Each time a hypothesis is canceled in a natural deduction 
rule, there must be a binder of the corresponding variable in the lambda calculus. 
This information more or less uniquely determines what the lambda terms should 
be; the only choice that is left is what to call them! 

We add four terms to the lambda calculus: 

Raw terms: M,N,P ::= ... | imA/ | in2Af 

I case M of x^ ^ N\y^ ^ P \ OaM 

The typing rules for these new terms are shown in Table |5] By comparing these 
rules to (V-/i), (V-/2), (V-E), and (J--E), you can see that they are precisely 
analogous. 

But what is the meaning of these new terms? The term iniM is simply an element 
of the left component of A + B. We can think of ini as the injection function 
A —>■ A + B. Similar for in2. The term (case M of x^ ^ TV | y-^ ^ P) is a 
case distinction: evaluate M of type A + B. The answer is either an element of 
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(itii) 

(in2) 

(case) 

(□) 



rhM: A 
rhiniM : A + B 

Th M : B 

r h in2M -.A + B 

ThM:A + B T,x:AhN:C T,y:BhP:C 

r h (case M of x^ ^ N\y^ ^ P) -.C 

r h M : 



r h DaM : A 
Table 5: Typing rules for sums 

the left component A or of the right component B. In the first case, assign the 
answer to the variable x and evaluate N. In the second case, assign the answer 
to the variable y and evaluate P. Since both N and P are of type C, we get a 
final result of type C. Note that the case statement is very similar to an if-then- 
else; the only difference is that the two alternatives also carry a value. Indeed, 
the booleans can be defined as 1 + 1, in which case T = ini*, F — in2*, and 
if_then_else MNP — case M of a;^ ^ N\y^ ^ P, where x and y don't occur 
in N and P, respectively. 

Finally, the term \3aM is a simple type cast, corresponding to the unique function 
Dyi : ^ A from the empty set to any set A. 

6.11 Classical logic vs. intuitionistic logic 

We have mentioned before that the natural deduction calculus we have presented 
corresponds to intuitionistic logic, and not classical logic. But what exactly is the 
difference? Well, the difference is that in intuitionistic logic, we have no rule for 
proof by contradiction, and we do not have A V ^A as an axiom. 

Let us adopt the following convention for negation: the formula ^A ("not A") is 
regarded as an abbreviation for A ^ _L. This way, we do not have to introduce 
special formulas and rules for negation; we simply use the existing rules for -^ 
and _L. 

In intuitionistic logic, there is no derivation of A V ^A, for general A. Or equiv- 
alently, in the simply-typed lambda calculus, there is no closed term of type 
A + (j4 -^ 0). We are not yet in a position to prove this formally, but informally, 
the argument goes as follows: If the type A is empty, then there can be no closed 
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term of type A (otherwise A would have that term as an element). On the other 
hand, if the type A is non-empty, then there can be no closed term of type ^ ^ 
(or otherwise, if we applied that term to some element of A, we would obtain an 
element of 0). But if we were to write a generic term of type A+ {A ^ 0), then 
this term would have to work no matter what A is. Thus, the term would have to 
decide whether to use the left or right component independently of A. But for any 
such term, we can get a contradiction by choosing A either empty or non-empty. 

Closely related is the fact that in intuitionistic logic, we do not have a principle of 
proof by contradiction. The "proof by contradiction" rule is the following: 

(contra,r) —^ . 

This is not a rule of intuitionistic propositional logic, but we can explore what 
would happen if we were to add such a rule. First, we observe that the contradic- 
tion rule is very similar to the following: 

r,x:A^ ± 

Th^A 

However, since we defined -^A to be the same as yl ^ _L, the latter rule is an 
instance of (^-/). The contradiction rule, on the other hand, is not an instance of 

If we admit the rule (contra), then A V -^A can be derived. The following is such 
a derivation: 

te ) f ;-^HAv^A),x:AhA 

_^ ^' y:^(A V-.A),x:Ah^(A V-.A) ^ y:^{AV ^A),x:Ah AV ^A 

y:^{AV^A),x:Ah ± 

^^^ y:^iAV^A)h^{AV^A) ^ ' ' ^:^(A V ^A) h A V ^A 

Conversely, if we added A V -^A as an axiom to intuitionistic logic, then this 
already implies the (contra) rule. Namely, from any derivation of T, x:^A h _L, 
we can obtain a derivation of F h A by using A V -^A as an axiom. Thus, we can 
simulate the (contra) rule, in the presence of A V -^A. 



(excluded middle) , , „ V,x:^A h _L , , 

i '- (-L-£) -^ — — 7 (axy) 



(V-£a;,y) 



Fh^V^A V,x:^AV A " V.y.AVA 

VV A 
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In this sense, we can say that the rule (contra) and the axiom A V -^A are equiva- 
lent, in the presence of the other axioms and rules of intuitionistic logic. 

It turns out that the system of intuitionistic logic plus (contra) is equivalent to 
classical logic as we know it. It is in this sense that we can say that intuitionistic 
logic is "classical logic without proofs by contradiction". 

Exercise 32. The formula {{A ^> B) -^ A) -^ Ais called "Peirce's law". It is 
valid in classical logic, but not in intuitionistic logic. Give a proof of Peirce's law 
in natural deduction, using the rule (contra). 

Conversely, Peirce's law, when added to intuitionistic logic for all A and B, im- 
plies (contra). Here is the proof. Recall that -^A is an abbreviation for A — > _L. 

/\p\ I , x:A > -L r -L 
(Peirce's law for B — IS) , . . F, x:A -^ _L h A 

,j,. tV{(A^±)^A)^A ^'^'^' T^(A^±)^A 
^ ' ThA 

We summarize the results of this section in terms of a slogan: 

intuitionistic logic + (contra) 
= intuitionistic logic + "A V -^A" 
= intuitionistic logic + Peirce's law 
= classical logic. 

The proof theory of intuitionistic logic is a very interesting subject in its own right, 
and an entire course could be taught just on that subject. 



6.12 Classical logic and the Curry-Howard isomorphism 

To extend the Curry-Howard isomorphism to classical logic, according to the ob- 
servations of the previous section, it is sufficient to add to the lambda calculus a 
term representing Peirce's law. All we have to do is to add a term C : {{A ^ 
B) ^ A) ^ A, for all types A and B. 

Such a term is known as Felleisen 's ^, and it has a specific interpretation in terms 
of programming languages. It can be understood as a control operator (similar 
to "goto", "break", or exception handling in some procedural programming lan- 
guages). 
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Specifically, Felleisen's interpretation requires a term of the form 

M ^ €(\k^^^ .N) -.A 

to be evaluated as follows. To evaluate M, first evaluate A^ . Note that both M and 
N have type A. If N returns a result, then this immediately becomes the result of 
M as well. On the other hand, if during the evaluation of N, the function k is ever 
called with some argument x : A, then the further evaluation of N is aborted, and 
X immediately becomes the result of M. 

In other words, the final result of M can be calculated anywhere inside N, no 
matter how deeply nested, by passing it to k as an argument. The function k is 
known as a continuation. 

There is a lot more to programming with continuations than can be explained in 
these lecture notes. For an interesting application of continuations to compiling, 
see e.g. |9| from the bibliography (Section O. The above explanation of what 
it means to "evaluate" the term M glosses over several details. In particular, we 
have not given a reduction rule for C in the style of /3-reduction. To do so is rather 
complicated and is beyond the scope of these notes. 



7 Polymorphism 

The polymorphic lambda calculus, also known as "System F", is obtained extend- 
ing the Curry-Howard isomorphism to the quantifier V. For example, consider the 
identity function Xx'^ .x. This function has type A ^ A. Another identity func- 
tion is Xx^ .X of type B ^t B, and so forth for every type. We can thus think of 
the identity function as a family of functions, one for each type. In the polymor- 
phic lambda calculus, there is a dedicated syntax for such families, and we write 
Ka.Xx"" .X of type Va.a -^ a. Please read Chapter 11 of "Proofs and Types" by 
Girard, Lafont, and Taylor IJ] . 



8 Weak and strong normalization 
8.1 Definitions 

As we have seen, computing with lambda terms means reducing lambda terms to 
normal form. By the Church-Rosser theorem, such a normal form is guaranteed 
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to be unique if it exists. But so far, we have paid little attention to the question 
whether normal forms exist for a given term, and if so, how we need to reduce the 
term to find a normal form. 

Definition. Given a notion of term and a reduction relation, we say that a term M 
is weakly normalizing if there exists a finite sequence of reductions M -^ Mi -^ 
. . . ^ Mn such that Af„ is a normal form. We say that M is strongly normalizing 
if there does not exist an infinite sequence of reductions starting from M, or in 
other words, if every sequence of reductions starting from M is finite. 

Recall the following consequence of the Church-Rosser theorem, which we stated 
as Corollary 14.21 If M has a normal form TV, then M ^ N . It follows that a 
term M is weakly normalizing if and only if it has a normal form. This does not 
imply that every possible way of reducing A/ leads to a normal form. A term is 
strongly normalizing if and only if every way of reducing it leads to a normal form 
in finitely many steps. 

Consider for example the following terms in the untyped lambda calculus: 

1. The term Vt — {Xx.xx){Xx.xx) is neither weakly nor strongly normalizing. 
It does not have a normal form. 

2. The term {Xx.y)n is weakly normalizing, but not strongly normalizing. It 
reduces to the normal form y, but it also has an infinite reduction sequence. 

3. The term {\x.y){{Xx.x){Xx.x)) is strongly normalizing. While there are 
several different ways to reduce this term, they all lead to a normal form in 
finitely many steps. 

4. The term Xx.x is strongly normalizing, since it has no reductions, much 
less an infinite reduction sequence. More generally, every normal form is 
strongly normalizing. 

We see immediately that strongly normalizing implies weakly normalizing. How- 
ever, as the above examples show, the converse is not true. 



8.2 Weak and strong normalization in typed lambda calculus 

We found that the term il = {Xx.xx){Xx.xx) is not weakly or strongly normaliz- 
ing. On the other hand, we also know that this term is not typeable in the simply- 
typed lambda calculus. This is not a coincidence, as the following theorem shows. 
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Theorem 8.1 (Weak normalization theorem). In the simply-typed lambda calcu- 
lus, all terms are weakly normalizing. 

Theorem 8.2 (Strong normalization theorem). In the simply-typed lambda calcu- 
lus, all terms are strongly normalizing. 

Clearly, the strong normalization theorem implies the weak normalization theo- 
rem. However, the weak normalization theorem is much easier to prove, which 
is the reason we proved both these theorems in class. In particular, the proof of 
the weak normalization theorem gives an explicit measure of the complexity of 
a term, in terms of the number of redexes of a certain degree in the term. There 
is no corresponding complexity measure in the proof of the strong normalization 
theorem. 

Theorem 8.3 (Strong normalization theorem for System F). In the polymorphic 
lambda calculus (System F), all terms are strongly normalizing. 

Please refer to Chapters 4, 6, and 14 of "Proofs and Types" by Girard, Lafont, and 
Taylor [2J for the proofs of Theorems [STl [12] and l8.3l respectively. 



9 Type inference 

In Section |6l we introduced the simply-typed lambda calculus, and we discussed 
what it means for a term to be well-typed. We have also asked the question, for a 
given term, whether it is typeable or not. 

In this section, we will discuss an algorithm that decides, given a term, whether 
it is typeable or not, and if the answer is yes, it also outputs a type for the term. 
Such an algorithm is known as a type inference algorithm. 

A weaker kind of algorithm is a type checking algorithm. A type checking algo- 
rithm takes as its input a term with full type annotations, as well as the types of 
any free variables, and it decides whether the term is well-typed or not. Thus, a 
type checking algorithm does not infer any types; the type must be given to it as 
an input and the algorithm merely checks whether the type is legal. 

Many compilers of programming languages include a type checker, and programs 
that are not well-typed are typically refused. The compilers of some programming 
languages, such as ML or Haskell, go one step further and include a type infer- 
ence algorithm. This allows programmers to write programs with no or very few 
type annotations, and the compiler will figure out the types automatically. This 
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makes the programmer's life much easier, especially in the case of higher-order 
languages, where types such as {{A -^ B) ^ C) ^ D are not uncommon and 
would be very cumbersome to write down. However, in the event that type in- 
ference fails, it is not always easy for the compiler to issue a meaningful error 
message that can help the human programmer fix the problem. Often, at least a 
basic understanding of how the type inference algorithm works is necessary for 
programmers to understand these error messages. 

9.1 Principal types 

A simply-typed lambda term can have more than one possible type. Suppose that 
we have three basic types ti, t2, ^3 in our type system. Then the following are all 
valid typing judgments for the term Xx.Xy.yx: 

h Ax'i.Aj/'-i^'i.yx : ti -^ (ti -^ Li) -^ ti, 

h Ax'^^'-\A2/('^^'-^)^'-\2;x : (ta ^ ta) ^ ((^2 ^ tg) ^ 13) ^ ig, 

h Aa;'i.Aj/'-i~"='.yx : ti -^ (ti -^ l^) -^ 43, 

h Ax'i.Ay'-i-*'-''-"-2_yx : ti -^ (ti -^ l^ -^ 12) ^ ts ^ '-2, 

h Ax'i.Aj/''i^'i^''i.j/a; : ti -^ (ti -^ Li ^> ii) ^ ti ^ ti. 

What all these typing judgments have in common is that they are of the form 

h Xx'^.Xy'^^^.yx ■.A^{A^B)^B, 

for certain types A and B. In fact, as we will see, every possible type of the term 
Xx.Xy.yx is of this form. We also say that A ^ (A ^ B) ^ B is the most 
general type or the principal type of this term, where A and B are placeholders 
for arbitrary types. 

The existence of a most general type is not a peculiarity of the term Xxy.yx, but 
it is true of the simply-typed lambda calculus in general: every typeable term has 
a most general type. This statement is known as the principal type property. 

We will see that our type inference algorithm not only calculates a possible type 
for a term, but in fact it calculates the most general type, if any type exists at all. 
In fact, we will prove the principal type property by closely examining the type 
inference algorithm. 
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9.2 Type templates and type substitutions 

In order to formalize the notion of a most general type, we need to be able to speak 
of types with placeholders. 

Definition. Suppose we are given an infinite set of type variables, which we de- 
note by upper case letters X, Y, Z etc. A type template is a simple type, built from 
type variables and possibly basic types. Formally, type templates are given by the 
BNF 

Type templates: A, B y.^ X \ l \ A ^ B \ A x B \ 1 

Note that we use the same letters A, B to denote type templates that we previously 
used to denote types. In fact, from now on, we will simply regard types as special 
type templates that happen to contain no type variables. 

The point of type variables is that they are placeholders (just like any other kind of 
variables). This means, we can replace type variables by arbitrary types, or even 
by type templates. A type substitution is just such a replacement. 

Definition. A type substitution cr is a function from type variables to type tem- 
plates. We often write [Xi ^^ Ai, . . . , Xn ^^ An] for the substitution defined by 
cr(XO = A, for i = 1 . . . n, and a(Y) = YifY(^{Xi,..., Xn}. If cr is a type 
substitution, and A is a type template, then we define a A, the application of a to 
A, as follows by recursion on A: 



aX = 


= aX, 


(7L = 

a{A^B) -- 

a{A X B) = 

ctI = 


= a A -> aB 
= aAx aB, 
= 1. 



In words, a A is simply the same as A, except that all the type variables have been 
replaced according to a. We are now in a position to formalize what it means for 
one type template to be more general than another. 

Definition. Suppose A and B are type templates. We say that A is more general 
than B if there exists a type substitution a such that a A = B. 

In other words, we consider A to be more general than B if B can be obtained 
from A by a substitution. We also say that B is an instance of A. Examples: 

• X ^ Y is more general than X ^ X. 
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• X ^ X is more general than b -^ l. 

• X ^ X is more general than (i ^ t) ^ (t ^ t). 

• Neither of t ^ i and (i ^ t) ^ (i ^ l) is more general than the other. We 
say that these types are incomparable. 

• X ^Y IS, more general than W ~* Z, and vice versa. We say that X ^>Y 
and W ^ Z are equally general. 

We can also speak of one substitution being more general than another: 

Definition. If r and p are type substitutions, we say that r is more general than p 
if there exists a type substitution a such that a ot — p. 

9.3 Unifiers 

We will be concerned with solving equations between type templates. The basic 
question is not very different from solving equations in arithmetic: given an equa- 
tion between expressions, for instance x + y = x^, is it possible to find values for 
X and y that make the equation true? The answer is yes in this case, for instance 
X = 2, y = 2 is one solution, and x = 1, y = is another possible solution. We 
can even give the most general solution, which is x = arbitrary, y = x^ — x. 

Similarly, for type templates, we might ask whether an equation such as 

X -* (X ^Y) = {Y -* Z) -*W 

has any solutions. The answer is yes, and one solution, for instance, \s X = l ^ l, 
Y — L, Z = t, W ^ {l ^ l) ^ L. But this is not the most general solution; the 
most general solution, in this case, is y = arbitrary, Z — arbitrary, X = Y ^ Z, 

W ={Y ^ Z)^Y. 

We use substitutions to represent the solutions to such equations. For instance, the 
most general solution to the sample equation from the last paragraph is represented 
by the substitution 

(J = [X ^Y ^ Z,W ^ {Y ^ Z) ^Y]. 

If a substitution a solves the equation A — B'\n this way, then we also say that a 
is a unifier of A and B. 
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To give another example, consider the equation 

X X {X -> Z) = {Z ^Y) xY. 

This equation does not have any solution, because we would have to have both 
X = Z ^Y andY = X ^ Z, which impUes X = Z ^ {X ^ Z), which 
is impossible to solve in simple types. We also say that X x {X ^ Z) and 
{Z ^ Y) X Y cannot be unified. 

In general, we will be concerned with solving not just single equations, but sys- 
tems of several equations. The formal definition of unifiers and most general 
unifiers is as follows: 

Definition. Given two sequences of type templates A = Ai, . . . , An and B = 

Bi, . . . , Bn, we say that a type substitution cr is a unifier of A and B if aAi — 
aBi, for alH = 1 . . . n. Moreover, we say that cr is a most general unifier of A 
and B if it is a unifier, and if it is more general than any other unifier of A and B. 

9.4 The unification algorithm 

Unification is the process of determining a most general unifier More specifically, 
unification is an algorithm whose input are two sequences of type templates A — 
Ai, . . . , An and B = Bi, . . . , Bn, and whose output is either "failure", if no 
unifier exists, or else a most general unifier cr. We call this algorithm mgu for 
"most general unifier", and we write mgu(A; B) for the result of applying the 
algorithm to A and B. 

Before we state the algorithm, let us note that we only use finitely many type 
variables, namely, the ones that occur in A and B. In particular, the substitutions 
generated by this algorithm are finite objects that can be represented and manipu- 
lated by a computer. 

The algorithm for calculating mgu(A; B) is as follows. By convention, the algo- 
rithm choses the first applicable clause in the following list. Note that the algo- 
rithm is recursive. 

1. uigu{X; X) = id, the identity substitution. 

2. mgu(X; B) = [X 1-^ B], if X does not occur in B. 

3. uigu{X; B) fails, if X occurs in B and B ^ X. 



76 



4. nigu(^; Y) = [Y 1-^ A], if Y does not occur in A. 

5. nigu(A; Y) fails, if Y occurs in A and A j^Y. 

6. nigu(t; l) = id. 

7. nigu(Ai ^ A2;Bi ^ Ba) = mgu(Ai, Aj; J5i, B2). 

8. nigu(Ai X A2;Bi x B2) = mgu(Ai, ^2; ^i, Ba)- 

9. nigu(l; 1) = id. 

10. nigu(A; B) fails, in all other cases. 

11. nigu(A, A;B,B) = f o p, where p = mgu(A; B) and r = nigu(pA; pB). 

Note that clauses l-[Tol calculate the most general unifier of two type templates, 
whereas clause[TT|deals with lists of type templates. Clause[TO]is a catch-all clause 
that fails if none of the earlier clauses apply. In particular, this clause causes the 
following to fail: mgu(yli -^ A2; Bi x B2), mgu(Ai -^ A2; l), etc. 

Proposition 9.1. //'nigu(A; B) — a, then a is a most general unifier of A and B. 
/f mgu(A; B) fails, then A and B have no unifier 

Proof. First, it is easy to prove by induction on the definition of mgu that if 
mgu(A; B) — a, then cr is a unifier of A and B. This is evident in all cases 
except perhaps clause [TT] but here, by induction hypothesis, pA = pB and 
f{pA) = f{pB), hence also f{p{A,A)) = f{p{B,B)). Here we have used 
the evident notation of applying a substitution to a list of type templates. 

Second, we prove that if A and B can be unified, then mgu(A; B) returns a most 
general unifier This is again proved by induction. For example, in clause |2] we 
have a — [X 1-^ B]. Suppose t is another unifier of X and B. Then fX = fB. 
We claim that f o cr = t. But f(cr(X)) = f{B) = f{X) = t{X), whereas if 
Y j^ X, then f{a{Y)) = f{Y) — t{Y). Hence foa — t, and it follows that a is 
more general than r. The clauses l-[TOlall follow by similar arguments. For clause 
nn suppose that A, A and B, B have some unifier cr'. Then cr' is also a unifier for 
A and B, and thus the recursive call return a most general unifier p of A and B. 
Since p is more general than a', we have R o p ~ a' for some substitution k. But 
then k{jjA) = a' A = a'B = R{pB), hence k is a unifier for pA and pB. By 
induction hypothesis, r ~ mg\x{pA] pB) exists and is a most general unifier for 
pA and pB. It follows that r is more general than k, thus R' o t — R, for some 
substitution k'. Finally we need to show that cr = f o p is more general than a' . 
But this follows because R'oa = R'ofop = Rop = a'. D 
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Remark 9.2. Proving that the algorithm mgu terminates is tricky. In particular, 
termination can't be proved by induction on the size of the arguments, because 
in the second recursive call in clause [TT] the application of p may well increase 
the size of the arguments. To prove termination, note that each substitution a 
generated by the algorithm is either the identity, or else it eliminates at least one 
variable. We can use this to prove termination by nested induction on the number 
of variables and on the size of the arguments. We leave the details for another 
time. 



9.5 The type inference algorithm 

Given the unification algorithm, type inference is now relatively easy. We for- 
mulate another algorithm, typeinfer, which takes a typing judgment T h M : B 
as its input (using templates instead of types, and not necessarily a valid typing 
judgment). The algorithm either outputs a most general substitution a such that 
aT h M : aB is a valid typing judgment, or if no such a exists, the algorithm 
fails. 

In other words, the algorithm calculates the most general substitution that makes 
the given typing judgment valid. It is defined as follows: 

1. typeinfer(a;i:^i, . . . , a;„:A„ \- x^ : B) = mgvi{Ai; B). 

2. typeinfer(r h MN : B) = focr, where cr = typeinfer(r h M : X ^ B), 
T — typeinfer(a-r h N : aX), for a fresh type variable X. 

3. typeinfer(r h Xx^.AI : B) — f o a, where a — mgu(i?;^ — * X) and 
T = typeinfer (ctF, x:aA h M : aX), for a fresh type variable X. 

4. typeinfer(r h (M, N) : A) = p o f o a, where a = mgu(A; X x Y), 
T = typeinfer(tTr h M : aX), and p = typeinfer (ftrr h N : faY), for 
fresh type variables X and Y. 

5. typeinfer (r h ttiM : A) = typeinfer (F \- M : A x Y), for a fresh type 
variable Y. 

6. typeinfer(r h tt2M : B) = typeinfer(r h M : X x B), for a fresh type 
variable X. 



7. typeinfer(r h * : A) = mgu(yl; 1^ 
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Strictly speaking, the algorithm is non-deterministic, because some of the clauses 
involve choosing one or more fresh type variables, and the choice is arbitrary. 
However, the choice is not essential, since we may regard all fresh type variables 
are equivalent. Here, a type variable is called "fresh" if it has never been used. 

Note that the algorithm typeinfer can fail; this happens if and only if the call to 
mgu fails in steps 1, 3, 4, or 7. 

Also note that the algorithm obviously always terminates; this follows by induc- 
tion on A/, since each recursive call only uses a smaller term M. 

Proposition 9.3. If there exists a substitution a such that aV h AI : aB is a valid 
typing judgment, then typeinfer(r l- M : B) will return a most general such 
substitution. Otherwise, the algorithm will fail. 



Proof. The proof is similar to that of Proposition 19. II D 

Finally, the question "is M typeable" can be answered by choosing distinct type 
variables Xi , . . . , X„, Y and applying the algorithm typeinfer to the typing judg- 
mentxiiXi, . . . ,a;„:X„ V- M -.Y. Note that if the algorithm succeeds and returns 
a substitution a, then aY is the most general type of M, and the free variables have 
types xiicrXi, . . . ,x„:crX„. 



10 Denotational semantics 

We introduced the lambda calculus as the "theory of functions". But so far, we 
have only spoken of functions in abstract terms. Do lambda terms correspond to 
any actual functions, such as, functions in set theory? And what about the notions 
of (3- and ?7-equivalence? We intuitively accepted these concepts as expressing 
truths about the equality of functions. But do these properties really hold of real 
functions? Are there other properties that functions have that that are not captured 
by /377-equivalence? 

The word "semantics" comes from the Greek word for "meaning". Denotational 
semantics means to give meaning to a language by interpreting its terms as math- 
ematical objects. This is done by describing a function that maps syntactic objects 
(e.g., types, terms) to semantic objects (e.g., sets, elements). This function is 
called an interpretation or meaning function, and we usually denote it by |— ]. 
Thus, if M is a term, we will usually write |A/] for the meaning of M under a 
given interpretation. 
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Any good denotational semantics should be compositional, which means, the in- 
terpretation of a term should be given in terms of the interpretations of its sub- 
terms. Thus, for example, |MiV] should be a function of |M] and {Nj . 

Suppose that we have an axiomatic notion of equality ~ on terms (for instance, 
/3?7-equivalence in the case of the lambda calculus). With respect to a particular 
class of interpretations, soundness is the property 

M ~ N => |M] = [Nj for all interpretations in the class. 

Completeness is the property 

|M] = |iV] for all interpretations in the class => M c^ N. 

Depending on our viewpoint, we will either say the axioms are sound (with respect 
to a given interpretation), or the interpretation is sound (with respect to a given set 
of axioms). Similarly for completeness. Soundness expresses the fact that our ax- 
ioms (e.g., f3 or rj) are true with respect to the given interpretation. Completeness 
expresses the fact that our axioms are sufficient. 

10.1 Set-theoretic interpretation 

The simply-typed lambda calculus can be given a straightforward set-theoretic 
interpretation as follows. We map types to sets and typing judgments to functions. 
For each basic type l, assume that we have chosen a non-empty set S,,. We can 
then associate a set |yl] to each type A recursively: 

lA^Bj = IBJM 

lAxBj = i^ii X m 

m = {*} 

Here, for two sets X, Y, we write Y^ for the set of all functions from X to Y, 
i.e., Y^ ~ {f \ f : X ^ Y}. Of course, X x Y denotes the usual cartesian 
product of sets, and {*} is some singleton set. 

We can now interpret lambda terms, or more precisely, typing judgments, as cer- 
tain functions. Intuitively, we already know which function a typing judgment 
corresponds to. For instance, the typing judgment a;:yl, /:A -^ B h- fx : _B corre- 
sponds to the function that takes an element x G |^] and an element / e |_B] I"^! , 
and that returns /(.t) e {BJ. In general, the interpretation of a typing judgment 

xi:Ai,...,Xn:An ^ M : B 
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will be a function 

|Ai]x...xIA„]^Ii?]. 

Which particular function it is depends of course on the term M. For convenience, 

if r = xi'.Ai, . . . , Xn'.An is a context, let us write |r] = |Ai] x . . . x |A„]. We 
now define |r h M : i?] by recursion on M. 

• If M is a variable, we define 

Ixi-.Ai, . . .,x„:A.a h X, : A,] = n, : |Ai] x . . . x |A„] -^ {A,}, 
where 7ri(ai, . . . ,a„) = a^. 

• If M = NP is an application, we recursively calculate 

/ = irhiV:A^i?]:iri^ii?]W, 

g = lrhP:A] :|r]^IA]. 
We then define 

by h{a) = /(a)(.g(a)), for all a G |r]. 

• If M = Ax'^.iV is an abstraction, we recursively calculate 

/ - lr,x:AhN:B]:lT]xlA]^lBj. 
We then define 

|r h Xx'^.N -.A^Bj^h-.lTj-^ |B]M 
by h{d){a) = /(a, a), for all a G |r] and a G |^1. 

• If M = {N, P) is an pair, we recursively calculate 

/ = {T^N-.Al-.lTl^lAl 
g = lVhP:Bl:lVl^lB\. 

We then define 

{T^ {N,P) : Ax B\=h:[T\^{A\x{Bl 
by;i(a) = (/(a),5(a)),forallaG[rl. 
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• If M ~ TTiN is a projection (for i = 1, 2), we recursively calculate 

We then define 

lTh7r,:B4^h: {Tj -. {Bij 

by h{a) ~ TTi{f{a)), for all a G |r]. Here tt^ in the meta-language denotes 
the set-theoretic function TTi : |_Bi] x |i?2] -^ |i?i] given by tt^ (61, 62) = 

• If M = *, we define 

ir h * : 1] = /i : in ^ {*} 
hyh{d) =*, for alia e |r]. 

To minimize notational inconvenience, we will occasionally abuse the notation 
and write |M] instead of |r h M : i?], thus pretending that terms are typing 
judgments. However, this is only an abbreviation, and it will be understood that 
the interpretation really depends on the typing judgment, and not just the term, 
even if we use the abbreviated notation. 

10.2 Soundness 

Lemma 10.1 (Context change). The interpretation behaves as expected under 
reordering of contexts and under the addition of dummy variables to contexts. 
More precisely, if a : {1, . . . , n} -^ {1, . . . , to} is an injective map, and if the 
free variables of M are among x^i, . . . , Xain, then the interpretations of the two 
typing judgments, 

/=[xi:Ai,...,x„:A„hM:Bl : |Ail x . . . x {AJi ^ {B}, 
g = Ix^i-.Aai, . . . , Xan-.A^n h M : 51 : |Xil x . . . x |A„„1 -> |B] 

are related as follows: 

/(ai, . . . , Gm) = giacrl, ■■■, acrn), 

for all ai e (Aij,. ..,am<^ {Anj- 

Proof. Easy, but tedious, induction on M. D 



82 



The significance of this lemma is that, to a certain extent, the context does not 
matter. Thus, if the free variables of M and N are contained in F as well as F', 
then we have 

iTh M -.Bj^lT^ N -.Bj iff |r'h Af :S] = |r'hiV:S]. 

Thus, whether M and A^ have equal denotations only depends on M and N, and 
not on F. 

Lemma 10.2 (Substitution Lemma). // 

|F, x:A h A/ : Bl = / : [F] x [^1 ^ |i3] and 
|FhiV:A]=.g:|F]^IAl, 

then 

lThM[N/x]:Bj=h:lTj^lBj, 

where h{a) — f (a, g{a)), for all a € |F]. 

Proof. Very easy, but very tedious, induction on M. D 

Proposition 10.3 (Soundness). The set-theoretic interpretation is sound for Prj- 
reasoning. In other words, 

M^p^N ^ |Fh Af :S] = |FhiV:B]. 

Proof Let us write Af - TV if |F h M : B] = {Gamma h TV : B] . By the 
remark after Lemma flO.ll this notion is independent of F, and thus a well-defined 
relation on terms (as opposed to typing judgments). To prove soundness, we must 
show that M =^^ N implies M -- N, for all M and N. It suffices to show that 
^ satisfies all the axioms of /377-equivalence. 

The axioms (refl), (symm), and (trans) hold trivially. Similarly, all the (cong) and 
(0 rules hold, due to the fact that the meaning of composite terms was defined 
solely in terms of the meaning of their subterms. It remains to prove that each of 
the various (/3) and (77) laws is satisfied (see pagelMTl. We prove the rule (/3^) as 
an example; the remaining rules are left as an exercise. 

Assume F is a context such that F, x:A h AT : S and F h iV : A. Let 

f = lr,x:Ah M : Bj : ITJ X lAj ^ IB], 

g=lThN:Al:lrl^lAl 

/i = |F h (Xx^.M) -.A^Bj: |F] ^ |S]M , 

k=lr^{Xx^.M)N:Bl:lrl^lBl 

l^lr^M[N/x]:Bl :[F]^|S1. 
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We must show k =^ h. By definition, we have k{a) = h{a){g{a)) — f{a,g{aj). 
On the other hand, l{a) — f{a, g{a)) by the substitution lemma. D 

Note that the proof of soundness amounts to a simple calculation; while there are 
many details to attend to, no particularly interesting new idea is required. This 
is typical of soundness proofs in general. Completeness, on the other hand, is 
usually much more difficult to prove and often requires clever ideas. 

10.3 Completeness 

We cite two completeness theorems for the set-theoretic interpretation. The first 
one is for the class of all models with finite base type. The second one is for the 
single model with one countably infinite base type. 

Theorem 10.4 (Completeness, Plotkin, 1973). The class of set-theoretic models 
with finite base types is complete for the lambda-f3ri calculus. 

Recall that completeness for a class of models means that if |M] = {NJ holds in 
all models of the given class, then M —p^i N. This is not the same as complete- 
ness for each individual model in the class. 

Note that, for each fixed choice of finite sets as the interpretations of the base 
types, there are some lambda terms such that |M] = |A^] but M 7^^^ N. For 
instance, consider terms of type {l ^ l) ^ l ^ l. There are infinitely many 
/377-distinct terms of this type, namely, the Church numerals. On the other hand, 
if Si, is a finite set, then {{l ^ i) ^ l ^ t] is also a finite set. Since a finite 
set cannot have infinitely many distinct elements, there must necessarily be two 
distinct Church numerals M, N such that |M] = |iV] . 

Plotkin's completeness theorem, on the other hand, shows that whenever M and 
A^ are distinct lambda terms, then there exist some set-theoretic model with finite 
base types in which M and A^ are different. 

The second completeness theorem is for a single model, namely the one where S^ 
is a countably infinite set. 

Theorem 10.5 (Completeness, Friedman, 1975). The set-theoretic model with 
base type equal to N, the set of natural numbers, is complete for the lambda- f3rj 
calculus. 

We omit the proofs. 
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11 The language PCF 

PCF stands for "programming with computable functions". The language PCF is 
an extension of the simply-typed lambda calculus with booleans, natural numbers, 
and recursion. It was first introduced by Dana Scott as a simple programming lan- 
guage on which to try out techniques for reasoning about programs. Although PCF 
is not intended as a "real world" programming language, many real programming 
languages can be regarded as (syntactic variants of) extensions of PCF, and many 
of the reasoning techniques developed for PCF also apply to more comphcated 
languages. 

PCF is a "programming language", not just a "calculus". By this we mean, PCF 
is equipped with a specific evaluation order, or rules that determine precisely how 
terms are to be evaluated. We follow the slogan: 

Programming language = syntax + evaluation rules. 

After introducting the syntax of PCF, we will look at three different equivalence 
relations on terms. 

• Axiomatic equivalence =ax will be given by axioms in the spirit of (3rj- 
equi valence. 

• Operational equivalence =op will be defined in terms of the operational 
behavior of terms. Two terms are operationally equivalent if one can be 
substituted for the other in any context without changing the behavior of a 
program. 

• Denotational equivalence =dcn is defined via a denotational semantics. 

We will develop methods for reasoning about these equivalences, and thus for 
reasoning about programs. We will also investigate how the three equivalences 
are related to each other. 



11.1 Syntax and typing rules 

PCF types are simple types over two base types bool and nat . 
A, B ::= bool | nat \ A-^ B\ A^B\i 
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(true) 
(false) 
(zero) 
(succ) 



r h T : bool ^ ,, r h Af : nat 

(pred) 



r h F : bool 



r h zero : nat 



(is zero) 



r h pred (M) 


nat 


r h i\/ : nat 


r h iszero (Af ) 


bool 


r h A/ : A - 


A 



(fix) 

r h Af : nat T h Y(A/) : A 



r h succ (A/) : nat 

,.„ rhA/:bool ThN-.A T h P : A 

(if) 

•^ r h if Af then iVelse P : A 

Table 6: Typing rules for PCF 

The raw terms of PCF are those of the simply-typed lambda calculus, together 
with some additional constructs that deal with booleans, natural numbers, and 
recursion. 



M,N,P ::= x \ MN 
T I F 



Ax^.Af I (Af,iV) I TTiAf I TTsM I 
zero I succ (Af ) | pred (Af ) 
iszero (Af) I if Af then TV else P I Y(Af) 



The intended meaning of these terms is the same as that of the corresponding 
terms we used to program in the untyped lambda calculus: T and F are the 
boolean constants, zero is the constant zero, succ and pred are the successor 
and predecessor functions, iszero tests whether a given number is equal to zero, 
if Af then N else P is a conditional, and Y(Af ) is a fixpoint of Af . 

The typing rules for PCF are the same as the typing rules for the simply-typed 
lambda calculus, shown in Table 21 plus the additional typing rules shown in Ta- 
ble |6l 

11.2 Axiomatic equivalence 

The axiomatic equivalence of PCF is based on the /Jry-equivalence of the simply- 
typed lambda calculus. The relation =ax is the least relation given by the follow- 
ing: 
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pred (zero ) = 

pred(succ(r7,)) = 
iszero (zero ) = 


= zero 

= R 
= T 


iszero (succ (n)) = 
if T then iV else P = 
if F then TV else P = 

Y(A/) . 


= F 

= N 
= P 
= A/(Y(M)) 



• 



Table 7: Axiomatic equivalence for PCF 

All the (3- and ry-axioms of the simply-typed lambda calculus, as shown on 
page [61] 

One congruence or ^-rule for each term constructor. This means, for in- 
stance 

M =ax M' N =ax N' P =ax P' 

if M then N else P =ax if M' then N' else P' ' 

and similar for all the other term constructors. 

The additional axioms shown in Table]?] Here, n stands for a numeral, i.e., 
a term of the form succ (. . . (succ (zero ))■■•). 



11.3 Operational semantics 

The operational semantics of PCF is commonly given in two different styles: the 
small-step or shallow style, and the big-step or deep style. We give the small-step 
semantics first, because it is closer to the notion of /3-reduction that we considered 
for the simply-typed lambda calculus. 

There are some important differences between an operational semantics, as we 
are going to give it here, and the notion of /3-reduction in the simply-typed lambda 
calculus. Most importantly, the operational semantics is going to be deterministic, 
which means, each term can be reduced in at most one way. Thus, there will never 
be a choice between more than one redex. Or in other words, it will always be 
uniquely specified which redex to reduce next. 

As a consequence of the previous paragraph, we will abandon many of the congru- 
ence rules, as well as the (0-rule. We adopt the following informal conventions: 
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• 



never reduce the body of a lambda abstraction, 

never reduce the argument of a function (except for primitive functions such 
as succ and pred ), 



• never reduce the "then" or "else" part of an if-then-else statement, 

• never reduce a term inside a pair. 

Of course, the terms that these rules prevent from being reduced can neverthe- 
less become subject to reduction later: the body of a lambda abstraction and the 
argument of a function can be reduced after a /3-reduction causes the A to disap- 
pear and the argument to be substituted in the body. The "then" or "else" parts 
of an if-then-else term can be reduced after the "if" part evaluates to true or false. 
And the terms inside a pair can be reduced after the pair has been broken up by a 
projection. 

An important technical notion is that of a value, which is a term that represents 
the result of a computation and cannot be reduced further Values are given as 
follows: 

Values: 1/,VF ::= T | F | zero | succ (F) | * | {M,N} \ Xx^^.M 

The transition rules for the small-step operational semantics of PCF are shown in 
Tabled 

We write M -^ N if M reduces to N by these rules. We write M -/-> if there 
does not exist N such that M -^ N. The first two important technical properties 
of small-step reduction are summarized in the following lemma. 

Lemma 11.1. 7. Values are normal forms. //'V^ k a va/Me, f/zen F 7^. 
2. Evalution is deterministic. IfM -^ N and M —> N', then N = N'. 

Another important property is subject reduction: a well-typed term reduces only 
to another well-typed term of the same type. 

Lemma 11.2 (Subject Reduction). IfT \- M : A and M -> N, then T h iV : A 

Next, we want to prove that the evaluation of a well-typed term does not get 
"stuck". If M is some term such that M -f^, but M is not a value, then we 
regard this as an error, and we also write M — > error . Examples of such terms 
are tti {\x.M) and (M, iV)P. The following lemma shows that well-typed closed 
terms cannot lead to such errors. 



M - 


-^N 




pred (A'f ) - 


-^ pred {N) 


pred (zero 


1 ) -^ zero 


pred (succ 

M - 




-^V 


iszero (Af ) - 


-^ iszero {N) 
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TT^M ^ TTiM' 

ni{M,N) ^ M 

Tr2{M,N) ->iV 

M : 1, M / * 

M^ * 

Af -^ Af' 



if A/ tlien A^ else P ^ if A^' tlien N else P 



if Tthen iVelse P ^ N 

if FthenA^else P ^ P 

Y(Af) ^ Af (Y(A4")) 
(Ax^.A//)7V^ Af[iV/x] 

Table 8: Small-step operational semantics of PCF 

Lemma 11.3 (Progress). If M is a closed, well-typed term, then either M is a 
value, or else there exists N such that M — > N. 

The Progress Lemma is very important, because it implies that a well-typed term 
cannot "go wrong". It guarantees that a well-typed term will either evaluate to a 
value in finitely many steps, or else it will reduce infinitely and thus not terminate. 
But a well-typed term can never generate an error. In programming language 
terms, a term that type-checks at compile-time cannot generate an error at run- 
time. 

To express this idea formally, let us write M -^* N in the usual way if M reduces 
to N in zero or more steps, and let us write M ^* error if Af reduces in zero or 
more steps to an error 

Proposition 11.4 (Safety). IfM is a closed, well-typed term, then M -/^* error. 

Exercise 33. Prove Lemmas flLmiLSl and Proposition I n.4| 
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A^(Y(Af)) ^ V 
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Table 9: Big- step operational semantics of PCF 
11.4 Big-step semantics 

In the small-step semantics, if Af — >* V, we say that Af evaluates to V . Note that 
by determinacy, for every Af , there exists at most one V such that Af — >* V . 

It is also possible to axiomatize the relation ''M evaluates to V" directly. This is 
known as the big-step semantics. Here, we write M JJ- F if M evaluates to V . 
The axioms for the big-step semantics are shown in Table|9l 

The big-step semantics satisfies properties similar to those of the small-step se- 
mantics. 

Lemma 11.5. f . Values. For all values V, we have V i}.V. 

2. Determinacy. IfM ^ V and M ^ V, then V = V. 

3. Subject Reduction. f/T h Af : A and M J| V, then V ^ V : A. 
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The analogues of the Progress and Safety properties cannot be as easily stated for 
big-step reduction, because we cannot easily talk about a single reduction step or 
about infinite reduction sequences. However, some comfort can be taken in the 
fact that the big-step semantics and small-step semantics coincide: 

Proposition 11.6. M ->* V iffM J| V. 



11.5 Operational equivalence 

Informally, two terms M and A^ will be called operationally equivalent if M and 
A^ are interchangeable as part of any larger program, without changing the ob- 
servable behavior of the program. This notion of equivalence is also often called 
observational equivalence, to emphasize the fact that it concentrates on observable 
properties of terms. 

What is an observable behavior of a program? Normally, what we observe about a 
program is its output, such as the characters it prints to a terminal. Since any such 
characters can be converted in principle to natural numbers, we take the point of 
view that the observable behavior of a program is a natural number that it evaluates 
to. Similarly, if a program computes a boolean, we regard the boolean value as 
observable. However, we do not regard abstract values, such as functions, as 
being directly observable, on the grounds that a function cannot be observed until 
we supply it some arguments and observe the result. 

Definition. An observable type is either bool or nat . A result is a closed value 
of observable type. Thus, a result is either T, F, or n. A program is a closed term 
of observable type. 

A context is a term with a hole, written C[— ]. Formally, the class of contexts is 
defined by a BNF: 

C[-] ::= [-]\x\C[~]N\MC[^]\\x^.C[-]\ ... 

and so on, extending through all the cases in the definition of a PCF term. 

Well-typed contexts are defined in the same way as well-typed terms, where it 
is understood that the hole also has a type. The free variables of a context are 
defined in the same way as for terms. Moreover, we define the captured variables 
of a context to be those bound variables whose scope includes the hole. So for 
instance, in the context {\x.[—]){\y.z), the variable x is captured, the variable z 
is free, and y is neither free nor captured. 
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If C[— ] is a context and M is a term of the appropriate type, we write C[M] for 

the result of replacing the hole in the context C[— ] by M. Here, we do not a- 

rename any bound variables, so that we allow free variables of M to be captured 

byCH- 

We are now ready to state the definition of operational equivalence. 

Definition. Two terms M, N are operationally equivalent, in symbols M =op N, 
if for all closed and closing context C[— ] of observable type and all values V, 

C[M] i^V <=> C[N] ^ V. 

Here, by a closing context we mean that C[— ] should capture all the free variables 
of M and A^. This is equivalent to requiring that C[M] and C[N] are closed terms 
of observable types, i.e., programs. Thus, two terms are equivalent if they can be 
used interchangeably in any program. 

11.6 Operational approximation 

As a refinement of operational equivalence, we can also define a notion of opera- 
tional approximation: We say that M operationally approximates N, in symbols 
M Cop N, if for all closed and closing contexts C[— ] of observable type and all 
values V, 

C[M] i^V ^ C[N] J| V. 

Note that this definition includes the case where C[M] diverges, but C[N] con- 
verges, for some N. This formalizes the notion that N is "more defined" than M. 
Clearly, we have M =op N iff M C^p N and N Cop M. Thus, we get a partial 
order Cop on the set of all terms of a given type, modulo operational equivalence. 
Also, this partial order has a least element, namely if we let il ~ Y(Aa;.a:;), then 
51 Cop N for any term N of the appropriate type. 

Note that, in general. Cop is not a complete partial order, due to missing limits of 
w-chains. 



11.7 Discussion of operational equivalence 

Operational equivalence is a very useful concept for reasoning about programs, 
and particularly for reasoning about program fragments. If M and N are opera- 
tionally equivalent, then we know that we can replace M by iV in any program 
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without affecting its behavior. For example, M could be a slow, but simple sub- 
routine for sorting a list. The term A^ could be a replacement that runs much faster 
If we can prove M and N to be operationally equivalent, then this means we can 
safely use the faster routine instead of the slower one. 

Another example are compiler optimizations. Many compilers will try to optimize 
the code that they produce, to eliminate useless instructions, to avoid duplicate 
calculations, etc. Such an optimization often means replacing a piece of code M 
by another piece of code N, without necessarily knowing much about the context 
in which M is used. Such a replacement is safe if M and N are operationally 
equivalent. 

On the other hand, operational equivalence is a somewhat problematic notion. The 
problem is that the concept is not stable under adding new language features. It 
can happen that two terms, M and N, are operationally equivalent, but when a 
new feature is added to the language, they become unequivalent, even ifM and N 
do not use the new feature. The reason is the operational equivalence is defined in 
terms of contexts. Adding new features to a language also means that there will 
be new contexts, and these new contexts might be able to distinguish M and N . 

This can be a problem in practice. Certain compiler optimizations might be sound 
for a sequential language, but might become unsound if new language features 
are added. Code that used to be correct might suddenly become incorrect if used 
in a richer environment. For example, many programs and library functions in C 
assume that they are executed in a single-threaded environment. If this code is 
ported to a multi-threaded environment, it often turns out to be no longer correct, 
and in many cases it must be re-written from scratch. 

11.8 Operational equivalence and parallel or 

Let us now look at a concrete example in PCF. We say that a term POR imple- 
ments the parallel or function if it has the following behavior: 

PORTP -^ T, for all P 
PORiVT -^ T, for all TV 
POR FF -^ F. 

Note that this in particular implies POR Tf7 = T and POR m = T, where Q. 
is some divergent term. It should be clear why POR is called the "parallel" or: 
the only way to achieve such behavior is to evaluate both its arguments in parallel, 
and to stop as soon as one argument evaluates to T or both evaluate to F. 
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Proposition 11.7. POR is not definable in PCF. 

We do not give the proof of this fact, but the idea is relatively simple: one proves 
by induction that every PCF context C[— , — ] with two holes has the following 
property: either, there exists a term N such that C[M, M'] = N for all M, M' 
(i.e., the context does not look at M, M' at all), or else, either C[Q,, M] diverges 
for all M, or C[M, Vt] diverges for all M. Here, again, O is some divergent term 
such as Y(Aa;.a:;). 

Although POR is not definable in PCF, we can define the following term, called 
the POR-tester. 

POR-test == Ax.if xTO then 

if xrJTthen 

if xFF then Vt 
else T 
else Vt 
else Vt 

The POR-tester has the property that POR-test M = T if Af implements the 
parallel or function, and in all other cases POR-test M diverges. In particular, 
since parallel or is not definable in PCF, we have that POR-test M diverges, for all 
PCF terms M. Thus, when applied to any PCF term, POR-test behaves precisely 
as the function \x.Vl does. One can make this into a rigorious argument that shows 
that POR-test and \x.il are operationally equivalent: 

POR-test =op Xx.n (in PCF). 

Now, suppose we want to define an extension of PCF called parallel PCF. It 
is defined in exactly the same way as PCF, except that we add a new primitive 
function POR , and small-step reduction rules 



M ^ M' N 


-^N' 


POR MN -^ POR M'N' 


POR TiV -^ 


T 


POR MT -^ 


T 



POR FF ^ F 
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Parallel PCF enjoys many of the same properties as PCF, for instance. Lem- 
mas [TO]-[II3] and Proposition 1 11. 41 continue to hold for it. 

But notice that 

POR-test T^op Ax.f^ (in parallel PCF). 

This is because the context C[—] — [—] POR distinguishes the two terms: clearly, 
C[POR-test] 4 T, whereas C[Xx.n] diverges. 

12 Complete partial orders 

12.1 Why are sets not enough, in general? 

As we have seen in Section [TOl the interpretation of types as plain sets is quite 
sufficient for the simply-typed lambda calculus. However, it is insufficient for a 
language such as PCF. Specifically, the problem is the fixpoint operator Y : (^ ^ 
A) -^ A. It is clear that there are many functions f : A ^ A from a set A to 
itself that do not have a fixpoint; thus, there is no chance we are going to find an 
interpretation for a fixpoint operator in the simple set-theoretic model. 

On the other hand, if A and B are types, there are generally many functions / : 
{Aj -^ iBj in the set-theoretic model that are not definable by lambda terms. 
For instance, if {Aj and |i3] are infinite sets, then there are uncountably many 
functions / : |A] -^ {BJ ; however, there are only countably many lambda terms, 
and thus there are necessarily going to be functions that are not the denotation of 
any lambda term. 

The idea is to put additional structure on the sets that interpret types, and to require 
functions to preserve that structure. This is going to cut down the size of the 
function spaces, decreasing the "slack" between the functions definable in the 
lambda calculus and the functions that exist in the model, and simultaneously 
increasing the chances that additional structure, such as fixpoint operators, might 
exist in the model. 

Complete partial orders are one such structure that is commonly used for this 
purpose. The method is originally due to Dana Scott. 
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Figure 4: Some posets 



12.2 Complete partial orders 

Definition. A partially ordered set or poset is a set X together with a binary 
relation C satisfying 

• reflexivity: for all a; e X, a; C x, 

• antisymmetry: for all x, y S X, x CI y and y C a; implies x = y, 

• transitivity: for all a;, y, z G X, x IZ y and j/ C z implies a; IZ z. 

The concept of a partial order differs from a total order in that we do not require 
that for any x and y, either x Q y or y Q x. Thus, in a partially ordered set it is 
permissible to have incomparable elements. 

We can often visualize posets, particularly finite ones, by drawing their line dia- 
grams as in Figure |4] In these diagrams, we put one circle for each element of 
X, and we draw an edge from x upward to y if a; C y and there is no z with 
X Q z Q y. Such line diagrams are also known as Hasse diagrams. 

The idea behind using a partial order to denote computational values is that x Qy 
means that x is less defined than y. For instance, if a certain term diverges, then 
its denotation will be less defined than, or below that of a term that has a definite 
value. Similarly, a function is more defined than another if it converges on more 
inputs. 
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Another important idea in using posets for modeling computational value is that 
of approximation. We can think of some infinite computational object (such as, an 
infinite stream), to be a limit of successive finite approximations (such as, longer 
and longer finite streams). Thus we also read x ^ y as x approximates y. A 
complete partial order is a poset in which every countable chain of increasing 
elements approximates something. 

Definition. Let X be a poset and let ^ C X be a subset. We say that a; G X is 
an upper bound for A if a C x for all a E A. We say that x is a least upper bound 
for A if x is an upper bound, and whenever y is also an upper bound, then x Q y. 

Definition. An oj-chain in a poset X is a sequence of elements a;o,a;i,X2, ■ • ■ 
such that 

xq 'Qxi E a;2 !^ . . . 

Definition. A complete partial order (cpo) is a poset such that every w-chain of 
elements has a least upper bound. 

If a;o, si, 3^2, • ■ • is an a;-chain of elements in a cpo, we write View ^i foi" ^^ Is^^'^ 
upper bound. We also call the least upper bound the limit of the a;-chain. 

Not every poset is a cpo. In FigurelH the poset labeled uj is not a cpo, because the 
evident tj-chain does not have a least upper bound (in fact, it has no upper bound 
at all). The other posets shown in Figure|4]are epos. 

12.3 Properties of limits 

Proposition 12.1. 1. Monotonicity. Suppose {xi}i and {yi}i are u-chains in 
a cpo C, such that Xi C y^ for all i. Then 

\fxi ^\fyt. 

i i 

2. Exchange. Suppose {xyji jg^ is o doubly monotone double sequence of 
elements of a cpo C, i.e., whenever i ^ i' and j ^ j' , then Xij CI Xi'ji. 
Then 

V \f Xij = \f \f Xij = V xkk- 

In particular, all limits shown are well-defined. 
Exercise 34. Prove Proposition [TzT] 
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12.4 Continuous functions 

If we model data types as cpo's, it is natural to model algorithms as functions 
from cpo's to cpo's. These functions are subject to two constraints: they have to 
be monotone and continuous. 

Definition. A function f : C ^ D between posets C and D is said to be mono- 
tone if for all a;, y e C, 

x^y => f{x)\=f{y). 

A function f : C ^ D between cpo's C and D is said to be continuous if it is 
monotone and it preserves least upper bounds of a;-chains, i.e., for all cj-chains 
{xijieNin C, 

.f{\f x,)^ \f fix,). 

The intuitive explanation for the monotonicity requirement is that information is 
"positive": more information in the input cannot lead to less information in the 
output of an algorithm. The intuitive explanation for the continuity requirement is 
that any particular output of an algorithm can only depend on a finite amount of 
input. 



12.5 Pointed cpo's and strict functions 

Definition. A cpo is said to be pointed if it has a least element. The least element 
is usually denoted _L and pronounced "bottom". All cpo's shown in Figure |4] are 
pointed. 

A coninuous function between pointed cpo's is said to be strict if it preserves the 
bottom element. 



12.6 Products and function spaces 

If C and D are cpo's, then their cartesian product C x D is also a cpo, with the 
pointwise order given by {x, y) C (x', y') iff x 'Q x' and y \— y'. Least upper 
bounds are also given pointwise, thus 



\fixz,yi) = (\fxt,\fyi). 



Proposition 12.2. The first and second projections, ni : C x D ^ C and 1x2 : 
C X D ^t D, are continuous functions. Moreover, if f : E -^ C and g : E ~* D 
are continuous functions, then so is the function h : E ^t C x D given by h[z) — 
(/W,.9W)- 

If C and D are cpo's, then the set of continuous functions f : C ^ D forms a cpo, 
denoted D'~' . The order is given pointwise: given two functions f,g:C~^D, 
we say that 

/ C g iff for all xeC, f{x) C g{x). 

Proposition 12.3. The set D^ of continuous functions from C to D, together with 
the order just defined, is a complete partial order 

Proof. Clearly the set D'-^ is partially ordered. What we must show is that least 
upper bounds of w-chains exist. Given an oj-chain /o, /i, . . . in D'~' , we define 
g G D'-' to be the pointwise limit, i.e., 

g{x) = V /.(x), 

for all X E C. Note that {fi{x)}i does indeed form an w-chain in C, so that g is a 
well-defined function. We claim that g is the least upper bound of {fi}i. First we 
need to show that g is indeed an element of D^. To see that g is monotone, we 
use Proposition ! 12. U l) and calculate, for any x C y G C, 

To see that g is continuous, we use Proposition 1 1 2. 1 1 2) and calculate, for any 
w-chain Xo,Xi, . . .inC, 

Finally, we must show that g is the least upper bound of the {fi}i. Clearly, fi^g 
for all i, so that g is an upper bound. Now suppose h G D'~^ is any other upper 
bound of {/,;}. Then for all x, fi{x) C h{x). Since g{x) was defined to be the 
least upper bound of {/i(a;)}i, we then have 5(2;) C h{x). Since this holds for all 
X, we have g ^ h. Thus g is indeed the least upper bound. 

Exercise 35. Recall the cpo B from Figure |4] The cpo B^ is also shown in 
FigureH] Its 1 1 elements correspond to the 1 1 continuous functions from B to B. 
Label the elements of B^ with the functions they correspond to. 
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Proposition 12.4. The application function D x C ^ D, which maps (/, x) to 
f{x), is continuous. 

Proposition 12.5. Continuous functions can be continuously curried and un- 
curried. In other words, if f : C x D — > E is a continuous function, then 
f* : C -^ E^ , defined by f*{x){y) — f{x,y), is well-defined and continuous. 
Conversely, if g : C — > E^ is a continuous function, then g^, : C x D ^ E, de- 
finedby g<f{x,y) — g{x){y), is well-defined and continuous. Moreover, (/*)* — f 
and (g*)* = g. 

12.7 The interpretation of the simply-typed lambda calculus in 
complete partial orders 

The interpretation of the simply-typed lambda calculus in cpo's resembles the set- 
theoretic interpretation, except that types are interpreted by cpo's instead of sets, 
and typing judgments are interpreted as continuous functions. 

For each basic type t, assume that we have chosen a pointed cpo S^. We can then 
associate a pointed cpo |A] to each type A recursively: 

lAxBj = 1^1 X IBj 
Typing judgments are now interpreted as continuous functions 

[All X ... X |A„i ^ m 

in precisely the same way as they were defined for the set-theoretic interpretation. 
The only thing we need to check, at every step, is that the function defined is 
indeed continuous. For variables, this follows from the fact that projections of 
cartesian products are continuous (Proposition |12.2t . For applications, we use the 
fact that the application function of cpo's is continuous (Proposition ! 12.4b . and for 
lambda-abstractions, we use the fact that currying is a well-defined, continuous 
operation (Proposition |12.5b . Finally, the continuity of the maps associated with 
products and projections follows from Proposition |12.2| 

Proposition 12.6 (Soundness and Completeness). The interpretation of the simply- 
typed lambda calculus in pointed cpo 's is sound and complete with respect to the 
lambda-Prj calculus. 
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12.8 Cpo's and fixpoints 

One of the reasons, mentioned in the introduction to this section, for using cpo's 
instead of sets for the interpretation of the simply-typed lambda calculus is that 
cpo's admit fixpoint, and thus they can be used to interpret an extension of the 
lambda calculus with a fixpoint operator 

Proposition 12.7. Let C be a pointed cpo and let f : C ^ C be a continuous 
function. Then f has a least fixpoint. 

Proof. Define xq = 1. and Xi^i = f{xi), for all i E N. The resulting sequence 
{xi}i is an w-chain, because clearly xq C xi (since xq is the least element), and 
if Xi C Xi+i, then f{xi) C f{xi+i) by monotonicity, hence x^+i C Xi^2- It 
follows by induction that Xi C Xi+i. Let x = \f^Xi be the limit of this w-chain. 
Then using continuity of /, we have 

i i i 

To prove that it is the least fixpoint, let y be any other fixpoint, i.e., let f{y) = y. 
We prove by induction that for all i, xi \— y. For i — this is trivial because 
Xq = -L. Assume Xi C y, then x^+i = f{xi) C /(j/) = y. It follows that y is an 
upper bound for {xi}i. Since x is, by definition, the least upper bound, we have 
x C J/. Since y was arbitrary, x is below any fixpoint, hence x is the least fixpoint 
of/. D 

If / : C ^ C is any continuous function, let us write /t for its least fixpoint. 
We claim that /^ depends continuously on /, i.e., that f : C*^ -^ C defines a 
continuous function. 

Proposition 12.8. The function f : C — > C, which assigns to each continuous 
function f G C^ its least fixpoint f^ E C, is continuous. 

Exercise 36. Prove Proposition |12.8l 

Thus, if we add to the simply-typed lambda calculus a family of fixpoint opera- 
tors Ya : {A ^' A) ^ A, the resulting extended lambda calculus can then be 
intei-preted in cpo's by letting 

lYAj^v-iAf^^^m. 
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12.9 Example: Streams 

Consider streams of characters from some alphabet A. Let A'^'^ be the set of finite 
or infinite sequences of characters. We order A by the prefix ordering: if s and t 
are (finite or infinite) sequences, we say s CI i if sis a prefix of t, i.e., if there exists 
a sequence s' such that t = ss'. Note that if s C i and s is an infinite sequence, 
then necessarily s = t, i.e., the infinite sequences are the maximal elements with 
respect to this order 

Exercise 37. Prove that the set A'^'^ forms a cpo under the prefix ordering. 

Exercise 38. Consider an automaton that reads characters from an input stream 
and writes characters to an output stream. For each input character read, it can 
write zero, one, or more output characters. Discuss how such an automaton gives 
rise to a continuous function from A^" ^ A'^'^. In particular, explain the mean- 
ing of monotonicity and continuity in this context. Give some examples. 



13 Denotational semantics of PCF 

The denotational semantics of PCF is defined in terms of cpo's. It extends the cpo 
semantics of the simply-typed lambda calculus. Again, we assign a cpo |A] to 
each PCF type A, and a continuous function 

ir h M : B] : fTj ^ fB] 

to every PCF typing judgment. The interpretation is defined in precisely the same 
way as for the simply-typed lambda calculus. The interpretation for the PCF- 
specific terms is shown in Table [TO] Recall that B and N are the epos of lifted 
booleans and lifted natural numbers, respectively, as shown in Figure|4] 

Definition. Two PCF terms M and N of equal types are denotationally equiv- 
alent, in symbols M =den N, if [Af ] = |iV] . We also write M Cdon N if 



13.1 Soundness and adequacy 

We have now defined the three notions of equivalence on terms: =ax, =op, and 
=dcn- In general, one does not expect the three equivalences to coincide. For 
example, any two divergent terms are operationally equivalent, but there is no 
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Types: |bool] = B 

Inatl = N 

Terms: |T] = T e B 

IFI - FGB 

|zero] = OeN 

/,.N. f -L if |M1 =-L, 

I^"^^(^)l = [n+1 iflMl=n 

± iflMl-±, 
Ipred (M)] = { if |M] = 0, 

n if |M] = n + 1 

± if[M]=±, 
liszero (M)] = <( T if |M] = 0, 

F if |M] = n + 1 

± if [M] = ^, 
|if M then iV else P] = <( |iV] if |M] = F, 

[P] if[M]=T, 

IY(M)1 = |M]t 

Table 10: Cpo semantics of PCF 
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reason why they should be axiomatically equivalent. Also, the POR-tester and 
the term Ax.fi are operationally equivalent in PCF, but they are not denotationally 
equivalent (since a function representing FOR clearly exists in the cpo semantics). 
For general terms M and N, one has the following property: 

Theorem 13.1 (Soundness). For PCF terms M and N, the following implications 
hold: 

Soundness is a very useful property, because M =ax N is in general easier to 
prove than M =dcn N, and M =dcn N is in turns easier to prove than M =op N . 
Thus, soundness gives us a powerful proof method: to prove that two terms are 
operationally equivalent, it suffices to show that they are equivalent in the cpo 
semantics (if they are), or even that they are axiomatically equivalent. 

As the above examples show, the converse implications are not in general true. 
However, the converse implications hold if the terms M and N are closed and 
of observable type, and if A^ is a value. This property is called computational 
adequacy. Recall that a program is a closed term of observable type, and a result 
is a closed value of observable type. 

Theorem 13.2 (Computational Adequacy). If M is a program and V is a result, 
then 

Proof. First note that the small-step semantics is contained in the axiomatic se- 
mantics, i.e., if M -^ N, then M =ax N. This is easily shown by induction on 
derivations of M ^ A^. 

To prove the theorem, by soundness, it suffices to show that M =op V implies 
M =ax V. So assume M =op V. Since F J| F and V is of observable type, it 
follows that M J| V. Therefore M ->* V by Proposition |11.6l But this already 
implies M =ax V, and we are done. D 



13.2 Full abstraction 

We have already seen that the operational and denotational semantics do not co- 
incide for PCF, i.e., there are some terms such that M =op N but M T^dcn ^■ 
Examples of such terms are POR-test and Xx.ft. 
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But of course, the particular denotational semantics that we gave to PCF is not the 
only possible denotational semantics. One can ask whether there is a better one. 
For instance, instead of cpo's, we could have used some other kind of mathemati- 
cal space, such as a cpo with additional structure or properties, or some other kind 
of object altogether. The search for good denotational semantics is a subject of 
much research. The following terminology helps in defining precisely what is a 
"good" denotational semantics. 

Definition. A denotational semantics is called fully abstract if for all terms M 
andiV, 

M =dcn N <=> M =op N. 

If the denotational semantics involves a partial order (such as a cpo semantics), it 
is also called order fully abstract if 

M Cdcn N <=> M Cop N. 

The search for a fully abstract denotational semantics for PCF was an open prob- 
lem for a very long time. Milner proved that there could be at most one such 
fully abstract model in a certain sense. This model has a syntactic description 
(essentially the elements of the model are PCF terms), but for a long time, no 
satisfactory semantic description was known. The problem has to do with sequen- 
tiality: a fully abstract model for PCF must be able to account for the fact that 
certain parallel constructs, such as parallel or, are not definable in PCF. Thus, the 
model should consist only of "sequential" functions. Berry and others developed 
a theory of "stable domain theory", which is based on cpo's with a additional 
properties intended to capture sequentiality. This research led to many interesting 
results, but the model still failed to be fully abstract. 

Finally, in 1992, two competing teams of researchers, Abramsky, Jagadeesan and 
Malacaria, and Hyland and Ong, succeeded in giving a fully abstract semantics 
for PCF in terms of games and strategies. Games capture the interaction between 
a player and an opponent, or between a program and its environment. By consid- 
ering certain kinds of "history-free" strategies, it is possible to capture the notion 
of sequentiality in just the right way to match PCF. In the last decade, game se- 
mantics has been extended to give fully abstract semantics to a variety of other 
programming languages, including, for instance, Algol-like languages. 

Finally, it is interesting to note that the problem with "parallel or" is essentially 
the only obstacle to full abstraction for the cpo semantics. As soon as one adds 
"parallel or" to the language, the semantics becomes fully abstract. 

Tlieorem 13.3. The cpo semantics is fully abstract for parallel PCF. 
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